
Avoid NULL - ingve
http://www.foundbit.com/en/resources/languages/cpp/expert/articles/cpp-avoid-null.html
======
mikekchar
Enumerating some techniques for avoiding NULL is useful, but it sidesteps the
real issue. It is important that programmers understand that avoiding NULL
will _not_ suddenly make your programs robust.

Generally speaking there is really only one case that you need to think about:
what do you do when a function needs to return a value and it has no value to
return. This often happens in error conditions, but it can also happen as a
result of poorly designed code (for example the ubiquitous "doUpdate()"
function that sometimes returns a result and sometimes doesn't, based on what
was passed to it). I will maintain that initialization is a special case of
this (I haven't initialized it yet, so I have no value to put in it).

However, more important than what you use to represent "I don't have a value"
values, is what you do with it if you get one. In several of the examples in
this article, a boolean is returned to indicated whether or not you have valid
data in the parameters passed by value. This may seem "safe" because you don't
crash, but there are often much more serious problems than crashing: data
corruption, for example.

Let's consider the case where I gather some data from somewhere and write it
to my database. In the case that my data collection failed, I certainly don't
want to write random data that was returned in the parameter. What happens if
the calling code neglects to check the boolean return value? This check is
identical (and could _literally_ be identical) to checking for NULL. The only
difference is that if I try to an uninitialized, but valid data structure to
my database, it will work. If I try to dereference NULL, it will crash. In
many cases, crashing is preferred.

Even when you return a NULL Object, you still need to think very hard about
what you want to do with it. What happens if someone tries to write a NULL
Object to the database? We know things are very, very wrong. We could just
skip it, but many times it would be prudent to halt execution of the program.

Too often I deal with programmers who blindly accept the advice that NULL is
evil without actually understanding the problem that NULL is intended to
solve. I have no problem with avoiding NULL (and in many cases it is a very
good idea). It is important, though, to understand that you _still_ have a lot
to do to write robust code.

~~~
tatterdemalion
The problem with null is that its not a type error to use it as if it were a
non-null value. What you want is for any code that could operate on invalid
data as if it were valid data to be a compile time error.

------
purpled_haze
I'll never understand why people would rather complicate things in an attempt
to avoid null checks rather than provide null checks.

~~~
jschwartzi
Yes. Malloc is a really good example of this, where malloc returning NULL
indicates a fundamental resource is not available. If you dereference the
pointer returned by malloc under those conditions, your program crashes, just
like if you were to not catch an exception thrown at instantiation-time with a
reference type. You should absolutely use NULL to represent the absence of
some fundamental resource, like memory or a device. To the extent that a
device is memory-mapped, it's the same thing.

NULL is just a tool. It's just a way for you to express some property of your
API.

That said, some of the strategies outlined in the article could be very useful
for avoiding use of pointers. Use of a class type implementing an interface as
a "null type" is a very good strategy. Such a type could guarantee successful
instantiation, and so avoid the need to throw exceptions or otherwise crash
entirely. You could architect an application with those strategies and
completely avoid the use of NULL and nullptr entirely.

But where your program is dependent on a finite resource, you must have a way
to represent the absence of that resource. NULL and nullptr are convenient
ways to do that without a ton of extra code. So no, don't avoid NULL.

~~~
mikeash
An optional type's "none" value represents the same properties that NULL does.
The difference is that the optional type better expresses what's going on.

NULL breaks the type system. Now every pointer type is actually two types: the
normal type, and the NULL type. You can dereference or call methods on one,
but not the other, and only a runtime check will tell you what you're dealing
with. Any time you see:

    
    
        Object *
    

You cannot read that as "pointer to Object." You have to read it as "pointer
to Object, or NULL."

This causes three big problems. One is that it's way too easy to forget to
check for NULL. The second, related problem is that without a way to express a
non-nullable pointer type, you have to rely on documentation and convention to
express the difference. The third is that since only pointer types are
nullable, you have to reinvent the wheel any time you you want to express the
concept of "null" for a non-pointer type.

To illustrate, you point out malloc which returns NULL when it runs out of
address space. What's the appropriate return type?

    
    
        void *
    

Now let's say you had some other memory allocation call which could never
return failure to the caller. Maybe it aborts execution on failure, or maybe
it retries continually until memory becomes available. What's the appropriate
return type?

    
    
       void *
    

Oops. Same problem with parameters. Here's a function:

    
    
       void f(void *ptr)
    

Are you allowed to pass NULL? Who knows, go RTFM. If the answer is "no," what
happens if you pass NULL anyway? The compiler certainly won't catch it, and
who knows what will happen.

NULL is "just a tool" but it's not a very good tool. It breaks the type system
in a fundamental and painful way.

Edit: whew, HN's markup makes it kind of hard to talk about C pointers!

~~~
jschwartzi
Right, but in malloc's case, it needs to express something about the
fundamental type of all software, which is memory. There is nothing more
fundamental to the software system. When you run out of memory and fail to
check for that condition, the only right and proper thing to do is crash. NULL
is a handy way to enforce that fundamentally, in that dereferencing a null
pointer guarantees a crash.

Part of programming is about memory management. You can hide it with a
sophisticated compiler but it's always going to be there, and there are always
going to be two types of memory: that which is available, and that which is
not.

~~~
sjolsen
>NULL is a handy way to enforce that fundamentally, in that dereferencing a
null pointer guarantees a crash

In the context of C and C++, dereferencing a null pointer does not guarantee a
crash, or anything else for that matter. In fact, not only does it not give
you any guarantees, it nullifies any guarantees you might otherwise have had,
because the behaviour of a program that dereferences a null pointer is
undefined. Now, an implementation is of course free to make guarantees about
programs that have undefined behaviour, but none I know of does.

>Part of programming is about memory management. You can hide it with a
sophisticated compiler but it's always going to be there, and there are always
going to be two types of memory: that which is available, and that which is
not.

This is not true. It is perfectly possible to write useful programs that never
have to deal with unavailable memory or even memory management at all past
compile time, even in C. This is quite common (as are implementations that
don't crash programs which dereference null pointers) when working with
embedded systems, for example.

------
halayli
Compilers can give us a warm feeling when they help us enforce invariants at
compile time, but those invariants can only be applied on information known at
compile time. pointer values cannot be known at compile time.

not_null gives you the illusion of creating compile time invariants based on
dynamic values. But that's impossible. You can still pass a nullptr to a
not_null parameter indirectly which will assert at runtime. It just prevents a
user from passing nullptr directly as a function argument. It provides no
advantage to pass by reference. If you pass a null reference (you gotta go out
of your way to do that), it will fail at runtime just like not_null. not_null
does provide a consistent way of doing assert(value != nullptr) but an assert
is clearer imo, not_null hides the assert away from the programmer and adds
ambiguity. You can forget to assert just like you can forget to specify a
parameter as not_null.

    
    
      int *x = nullptr;
      int **value = &x;
      func(*value);
    

This will compile just fine, but will assert at runtime.

If the program state cannot proceed when one of the functions is passed a
nullptr then assert in the function and make it clear, otherwise check for
nullptr condition and proceed accordingly. You can forget to do that just like
you can forget to increment a value.

------
roel_v
Beware that his example of boost::optional is, as far as I know, wrong. He
returns an empty student, which you can't test for (unless your Student() has
a way of telling you that it's default-constructed, and when a default-
constructed Student is in fact the same as a 'null' Student). Instead of
return boost::optional<Student>(); he should have done return boost::none; .

~~~
vp2015
Are you sure about that?

Both resources below say something opposite.

[http://bit.ly/1SEnJQe](http://bit.ly/1SEnJQe) \- see optional<T>::optional()

and [http://bit.ly/1lh7eyv](http://bit.ly/1lh7eyv) \- see default constructor

(I'm just curious)

Edit: I've tested it, you are wrong. Default constructor of optional doesn't
call Student constructor (VS2015).

~~~
roel_v
Yes you're right, I misread. I still think the preferred idiom is to use
boost::none (like the optional documentation does).

~~~
vp2015
In fact boost::optional examples use default constructor in this case - at
least all examples I've seen, see
[http://www.boost.org/doc/libs/1_58_0/libs/optional/doc/html/...](http://www.boost.org/doc/libs/1_58_0/libs/optional/doc/html/boost_optional/examples.html)

------
userbinator
NULL/zero is a _very_ simple concept. It means the lack of something. C/C++'s
equating of null/zero with false also makes great sense with you think of it
this way. Thus my personal belief is actually "avoid NULL, use zero instead."

 _program will fail in run-time. Sometimes it 's not an acceptable situation,
sometimes it just makes debugging other problems more complicated._

I disagree. If I were to make a list of all the bugs I had to find over the
years, sorted by difficulty, a nullpo (usually caused by something else) would
be near the bottom. They're quite obvious precisely because of the crash that
occurs when they're used, and you can usually easily trace back to the origin
of that value.

Seeing all these clumsy "solutions" that increase complexity just to avoid
something so simple is both sad and somewhat amusing. It's a bit cargo-culty.
E.g. if you use the "Null Object Pattern", with the example given you still
have to presumably check whether the returned value is an instance of the null
object or not... so not only do you still have to do the same thing you'd have
to do if you'd just used a simple 0, you now also have to make an additional
class and figure out how to compare with it. It's obfuscatory.

I think avoiding "NULL" makes as much sense as avoiding the number 0 - i.e.,
none, null, \0, nada, zilch. ;-)

~~~
hawkice
Yes, NULL means the lack of something. Very simple. But not having something
when you think you do is by far the most common class of errors. So if you are
always clear about the fact you either WILL have something or MAY have
something (instead of it being hidden), you can write software that you are
more confident in, and (hopefully) crashes with completely opaque errors less
frequently. [This is why I happen to completely agree w.r.t. Null objects -- I
don't see a point unless it means something extremely specific. Like, File
openFile() could return a UnreadableFile null-ish object as an error code, but
that seems worse than just having checked exceptions.]

~~~
userbinator
_But not having something when you think you do is by far the most common
class of errors._

In all the things I've debugged that's almost never been the actual error, but
rather a symptom of something else (running out of memory, failed file
opening, etc.) that needs further investigation. In my experience the "check
for null, skip processing and continue if a value is null" way is a far worse
option since it causes errors to propagate further, whereas a crash is an
unmistakable call to action.

~~~
hawkice
Depends a lot on programming language. Stuff like that tends to end up in
Exceptions, if you have decently useful exceptions available.

------
kirkbackus
In the last example, wouldn't a Singleton to represent a null student be more
effective since you could do a check for the null student by reference?

------
vp2015
I've just updated this article, thanks for interest & for sharing it btw.

------
mirimir
Maybe avoid null in C++. I have no clue.

But null is a useful concept in SQL.

~~~
yahelc
Useful but dangerous:

If you have (1,2,3,4,NULL) as data in column foo, and you query

    
    
        select foo from tbl WHERE foo!=1
    

Returns:

    
    
        2
        3
        4
    

Which, makes sense, but can be very confusing if you're not explicitly looking
out for it.

Redshift (and I think Postgres generally) will also, very confusingly if
you're not aware of it, return no rows if your subquery returns any NULL
values in a NOT IN scenario.

If bar is (1,2,3,4,5)

    
    
        SELECT * FROM bar WHERE foo NOT IN (SELECT foo FROM tabl)
    

Will return no results (instead of a single row with 5).

~~~
mirimir
SELECT * FROM bar WHERE foo IS NOT NULL AND foo NOT IN (SELECT foo FROM tabl)

~~~
dllthomas
The fact that you seem to think that changes anything clearly demonstrates
that this behavior is confusing.

~~~
mirimir
OK, so I don't currently have a server to test that on. And I'm a little
rusty.

But are you saying that my query doesn't resolve the NULL issue? I know that
I've done just that with tables containing NULL values. Maybe I screwed up the
format.

~~~
dllthomas
Your adjustment will filter out any nulls that would have otherwise been in
the output. The issue was not NULLs in the output (there was nothing in the
output), but NULLs in the subquery in the NOT IN clause. You could address
this by filtering inside the subquery:

    
    
        SELECT * FROM bar WHERE foo NOT IN (SELECT foo FROM tabl WHERE foo IS NOT NULL)
    
    

Making this a little more realistic:

    
    
        SELECT * FROM activation_code
        WHERE code NOT IN (
            SELECT activation_code_used FROM user
        );
    

If that query was previously returning many rows, I contend that many people
would not expect it to start returning zero rows just because a user was
created without using an activation code.

~~~
mirimir
Thanks. I am sometimes just so perfectly assbackwards, and so blind to it ;)

------
alfiedotwtf
Something something Rust.

