Hacker News new | comments | show | ask | jobs | submit login
Why we've banned null in our codebase (stackmob.com)
45 points by zazenergy 1133 days ago | past | web | 59 comments



Since he's writing Java, I would rewrite his initial function:

    public int getIntFromHeader(Request req, String headerName) {
        if(req != null && req.getHeaders() != null) {
            String header = req.getHeaders().get(headerName);
            if(header != null) {
                try {
                    return Integer.parseInt(header);
                } catch(Exception e) {
                   return -1;
                }
            }
        }
        return -1;
    }
as

    public int getIntFromHeader(Request req, String headerName) {
        return Integer.parseInt(req.getHeaders().get(headerName));
    }
That way it either returns with a valid int or throws an exception of some sort (null or otherwise).

Also, -1 is a perfectly valid integer and so this function can't distinguish between a header with -1 in it and an error. The fact that there's a req != null check seems silly. Surely, the caller should be responsible for that and if they aren't they should get an exception.

-----


My preferred approach here is to have two methods; one that throws and one that also takes a default value.

If the caller would handle the not-present case by assuming a value then use the default-value version; if it's an error then use the throwing version.

-----


And even in this case, a method full of != null checks seems silly when you can just catch NullPointerExceptions and return the default, no?

Also, Java is statically typed. If you need such dynamic capabilities, look elsewhere?

-----


The problem with catching an NPE is that you don't know what was null, so you can't know if it was expected or not. Was req null? Or was the NPE thrown from inside GetHeaders()? Or something else entirely?

-----


Right, but in the provided example what was null doesn't seem to matter. Only that there was a null reference.

-----


It's normally considered bad form to catch NPE, particularly if a simple (x == null) check is the alternative. But that's largely a code-style issue; each to their own.

I'm not sure what you mean by your static typing comment; I don't see where dynamic typing would offer a better alternative to anything we're talking about here?

-----


It just seems like in a statically typed language, users should be more aware of nulls and handling them. Maybe I'm being naive, but it seems inherent in the paradigm that you shouldn't be playing as loosely with null and not null pointers. You should be fully able to expect a value to be there or not, by virtue of the certainty of your types and your conventions.

I suppose when you talk about a large codebase used by people who are unaware of your conventions then you can run into issues, but I'd imagine that either null serves a purpose in your system, or not. If it does, handle it with the appropriate logic, if not, you shouldn't have to worry.

Dynamic typing would offer you the convention of inherently being uncertain about what you're receiving and being overly defensive about everything.

-----


Agreed. The caller should be responsible for checking whether req is null. As for getHeaders, why can it return null? When does a request have null headers?

-----


I agree that it's a bad idea to have a null check on req & req.getHeaders(). One might think that this kind of thing makes your code more resilient, but instead it just makes it harder to track down the inevitable bugs like forgetting to initialize Request#headers.

-----


This seems to be missing the point. There certainly are paradigms (C++ RAII being the most obvious) where you work with objects or data that can be affirmatively guaranteed to exist. And those indeed have value and are worth emulating even in environments that don't natively support them (the linked article is in Java).

But that has nothing to do with "null". There are other paradigms (linked lists or weak references, say) where you want the ability to distinguish between a "valid" object and an invalid/empty/not-present one. And whether you call this thing "null" or not is just a semantic question. It's no less an error to operate past the end of the list or try to access an expired cache empty just because you decided to allocate memory to store it.

-----


Nah, that's not the issue. The way the game works is this:

1. Decide on some property you want the compiler to enforce for you. E.g. I care about only operating on values that exist.

2. Encode this in the type system. E.g. use the Option type to distinguish those values that may not exist (Options) from those that definitely do (everything else).

It's not about being able to distinguish valid and invalid values, it about getting the compiler to enforce it for you. This is the real power of modern type systems -- not bookkeeping to keep the optimizer happy but getting the computer to check the things you care about.

-----


In the specific example of the blog post, an isDefined() is still necessary. How is `isDefined()` different than `ref !== null`?

-----


The difference is that you can distinguish at the type level between values that could be "null" (Optionals) and values that are never null, and the two look different from each other. Otherwise you tend to have one of two problems:

a) Every function checks all its parameters for null, and you do null checks before calling a method on any object, wasting time.

b) It's down to the programmer to figure out where a null check is necessary and where it isn't, and they sometimes get it wrong.

(Of course, better code will avoid calling isDefined or get, but even if you just use it as a like-for-like replacement, using optionals in place of null can improve your program)

-----


Null checks in java are quite cheap, from what I've read.

If your code doesn't check for a null, JIT adds in an implicit check in case it needs to throw a NullPointerException. When you put a check in your code, it knows to eliminate the implicit check.

Putting null checks everywhere does lead to ugly, complicated-looking code though; it'd be nice if there were a simple syntax for non-null parameters.

-----


It's not really about computational expense. It's about making reliable, maintainable software.

The problem with null is that it's a semantic gremlin. Depending on context it could mean all sorts of things including but not limited to, "the value has not yet been set", "the value is undefined", "the value is nothing" or "the software is broken". Which one of those is the correct interpretation at any given time is a question that can only really be answered by a programmer with a good knowledge of the codebase.

Worse yet, even in cases where null isn't a desired behavior it's still inescapable. If your procedure accepts reference types as parameters, it doesn't matter if it has no interest in taking null as an argument, or that it doesn't even make any sense (semantically) for null to be passed as an argument. It still has to be prepared to be handed null as an argument. At a language level, there's simply no support for the concept of, "Guys, I really do need something to go here." Which is simply insane, considering how simple a concept it is.

In languages that don't have null references, it's not that the various semantic meanings of null have been thrown out. All that's been removed is the ambiguity: For each possible (desirable) meaning, a construct is provided to represent specifically that meaning. Meaning that you've gotten rid of the problem of programmers having to magically know what null means in that case, and the worse problem of programmers introducing bugs resulting from them failing to understand null's meaning in a particular context.

-----


It isn't. The difference is that in all the places where you don't use Option[T], you no longer have to have a null check because you have "banned null."

This doesn't seem like a particularly good idea if any of your code needs to run reasonably quickly. Even relatively tame uses of Option[T] in the collections that ship with scala can have a terrible impact if you want to run the code rather than just typecheck it. For instance, if you have some loop in which you want to perform I/O and call getOrElseUpdate on your scala.mutable.HashMap[T, Long], your code will probably spend a majority of its time creating millions of java.lang.Longs, stuffing each one into its own newly-minted Some[Long], and GCing these two things.

-----


If you changed reasonably quickly to extremely quickly I would agree. There is some overhead involved but it is very small. In a copying collector, like the 1st generation of Hotspot's generational GC, the GC cost is proportional to the amount of non-garbage you allocate. Creating lots of very short-lived objects is almost free (allocation is very fast in all GCs). The cost is in memory consumption -- a classic tradeoff of time for space.

Of course if this is an issue you can always fall back to Java's collections or a collection class specialised to Longs. I think the Scala implementors made the right tradeoff for a general purpose library.

-----


The idea is that you don't want to call isDefined at all - you'll want to call getOr(T default) or pass it through a chain of methods that are overloaded doFoo(Some<T> x) { .. stuff .. return Some<U> }/doFoo(None x) { return None; }

-----


The latter won't work for you as-is:

  scala> def doFoo(x: Some[Int]) = x.get*2
  doFoo: (x: Some[Int])Int

  scala> def doFoo(x: None.type) = None
  doFoo: (x: None.type)None.type

  scala> doFoo(Option(3))
  <console>:9: error: type mismatch;
   found   : Option[Int]
   required: None.type
                doFoo(Option(3))
                            ^
But it will work as either of these:

  scala> def doFoo(x: Option[Int]): Option[Int] = x match {
       |   case Some(n) => Some(n*2)
       |   case _ => None
       | }

  scala> def doFoo(x: Option[Int]) = x.map(_*2)
  doFoo: (x: Option[Int])Option[Int]

-----


How is you description different from "emulating paradigms like RAII in environments that don't natively support them?". And it doesn't address things like the trivial list/cache case, where clearly you need a null(-equivalent object reference). Clearly the compiler isn't going to be able to enforce that all lists are infinitely long...

I'm not saying it's a bad trick. I'm saying that thinking of it as "banning null" is an incomplete understanding of the issue.

-----


I hate to say "you should," but you really should spend a few hours going through a Haskell tutorial. You'll be surprised how different things are.

The Haskell equivalent to this Scala Option fiasco is Maybe, which is defined as:

data Maybe a = Just a | Nothing

"Nothing" is not a null reference and "Just a" is not an object with a pointer to another object. "Just a" is a constructor with a single value parameter. "Nothing" is a constructor that takes no parameters. These are values, not references or values with references or references to references. "Nothing" always has a specific type (Maybe Integer, Maybe String, etc.) even though it has no auxiliary data of its own. This means "Nothing" can't show up anywhere you have values--it can only show up where you expect a value of type Maybe. Indeed, the whole point here is that you are forced to explicitly reckon with the possibility of null where it can happen, and you are freed from responsibility to worry about it everywhere else.

Haskell does just fine without the kind of null we're talking about, in a way that I suspect is fundamentally different from what you're intuiting.

-----


When you need a null, you use None.

-----


No, it's very much not semantic. With nulls, a public method accepting a Foo must be prepared to deal with receiving a null-pointer for Foo. With Options, if you declare a parameter as Foo, you've communicated that you expect a Foo, if you declare a Option<Foo>, you're prepared to accept a None.

-----


I think you are missing the point.

The problem is that for the most part needing to check whether an object is null/"valid" should be an exceptional case, not a common one. Most object variables do not need to have the optional/null state. For languages that allow every object to be null you create a situation where you need to have null checks for every object you encounter unless you come up with some error prone convention outside the language itself. Every method/function has to handle the nulls, it's not clear whether an object will be null when passed into a method or not, and if you are writing APIs for people to use, you must check for nulls because you cannot expect much from the user of the apis.

TLDR; Having non null objects by default reduces the amount of code, which in turn increases the correctness of the code.

-----


needing to check whether an object is null/"valid" should be an exceptional case

Handily enough, C#, Java and similar languages actually throw an exception when you try to work with a null reference as if it wasn't null. These exceptions are generally only cryptic when the null references are stored somewhere other than the stack before being manipulated. That is, so long as foo(null) throws an exception before foo() returns, it's no big deal to diagnose, and mostly unnecessary to check the value of the parameter as it passes through the graph of method calls.

(To a degree I'm arguing a devil's advocate position, as I have a lot of sympathy for encoding not-null-by-default into the type system.)

-----


In that world you have bugs lying in wait inside the code ready to rear their head at inopportune times, better to not allow their existence at all. Also in terms of difficulty of fixing bugs its a lot easier to fix bugs at compile time on your machine then it is later on in the release cycle. I think this kind of attitude is the reason why maintenance is the most expensive part of software development. It's also the work involving the most drudgery.

-----


In my experience, the two biggest reasons for expensive maintenance are (a) not all the code is understood by the people maintaining the system because past maintainers have moved on, and (b) assumptions have been made for performance or simplicity reasons that remove abstraction boundaries and embed assumptions about how the whole works inside different parts of the system, such that when one part needs major change, many other parts also need careful change; this is made worse by (a), because later maintainers usually only understand a subset of the individual parts.

This isn't something easily fixed by a compiler or language feature. There's no magic bullet.

Bugs "lying in wait" are a minor problem; if they lie in wait long enough, they are almost by definition not a problem at all.

-----


>The problem is that for the most part needing to check whether an object is null/"valid" should be an exceptional case, not a common one.

It's possible to code around it if you're presenting a packaged executable or product, but if you're creating an API of any sort, or dealing with user input, it should be a standard case.

Blank inputs or null cases should always be checked in that case.

-----


I don't think you understand. If you don't allow null by default in your type system then you don't have to check for it when you don't need it.

Dealing with user input is a separate issue, yes you need to validate it. The problem with every type allowing null is that it infects EVERY object for a situation that isn't always needed or wanted.

-----


While I agree in theory, boxing a null like that can have severe garbage collection implications. If every core function that required a simple two or three state response returned a boxed object then it also adds a requirement for the GC to "throw away" the object when necessary. The more trash the GC needs to clean up = the more time the GC uses cleaning up.

I'd be careful - that's all. I've seen systems which have done exactly this for core pieces of logic and they have suffered miserably during long running processes.

Mind you I'm referring to Java/.NET here - I'm not familiar with Scala's subsystem so can't comment on that...

-----


The idea here isn't to replace every T with Option[T]. The idea is to locate the places where incoming data might be absent and handle it there, so that the other 90% of your code can just use T with wild abandon. Like most methodologies, if you do it wrong it will be a fiasco and if you do it right it won't be so bad.

-----


This seems indicative of the arrow anti-pattern[1] and guard clauses[2][3] would be preferable to banning null and cluttering your class definitions with isDefined everywhere.

[1] http://www.codinghorror.com/blog/2006/01/flattening-arrow-co...

[2] http://martinfowler.com/refactoring/catalog/replaceNestedCon...

[3] http://c2.com/cgi/wiki?GuardClause

The most important point: handle negative conditions first, so checking for null first and putting a return statement right away when detected.

-----


You should look at the second part of the blog post[1]. It shows how to get around the "arrow anti-pattern" using map, filter, flatMap and for-comprehensions in Scala. It's actually both simple and elegant! Also not very hard to understand.

[1]: https://blog.stackmob.com/2013/01/free-yourself-from-the-tyr...

What you're actually doing is using the Option monad in Scala, but the article doesn't tell you that to avoid scaring you. And it succeeds: the pattern described in the second part of the blog post is easy to follow and clearly has no magic.

-----


Option types represent computations that can fail. In order for an option type to be at all useful, you need to be able to actually do computations: take an Option<A> and a function A → B, and get back an Option<B>. That’s exactly what the Functor, Applicative, and Monad instances for the Maybe type in Haskell do.

Here, you’ve just traded one kind of checking (!= null) for another (isDefined). This is more type-safe, to be sure, but has a high legibility cost. There is also no gain in exception safety: NoSuchElementException is not appreciably more meaningful than NullReferenceException.

-----


In my experience the legibility cost is more than outweighed by the fact that Option is only used in a small set of places, rather than implicitly used everywhere as with nullable references. Because it's only used in a few places, the use of `.isDefined()` or pattern matching stands out and actually helps to make the reader aware of where computations could fail.

-----


Sure, it’s certainly better to be explicit. I’m just saying that you can have explicit checked optionality and safety and legible code.

-----


You should read part 2 of the post.

-----


Everybody should read part 2 of this post[1]. That's where all the interesting things are! I wish they had either not separated out the interesting bits into a different article or that part 2 was the article on HN.

[1]: https://blog.stackmob.com/2013/01/free-yourself-from-the-tyr...

Essentially, part 2 shows how you can very easily overcome the issues with the approach in part I using some nice Scala features. You can get elegant code without sacrificing the more expressive types.

-----


At the end of the day, you can't get away from the fact that arguments and return values may be NULL or they may not, and if so, NULL is interpreted in a certain way by the program that depends on the context. Whether these semantics are documented, enforced by the language, or neither, the programmer using a function must know them in order to write correct code. The convention I've worked with is that these semantics are either obvious (because the scope of a function is small, and it's reasonable to require programmers to examine the function to see what it does) or documented. Mandating that the basic types cannot be null and requiring a special null-able type forces programmers to consider this issue, but if you work with a code base that understands the above, I'm not sure the trouble and obfuscation is worth it.

There's a history of elevating useful programming patterns (constructors, generic references, and dozens of other C++ features come to mind) to the level of being enforced by the language or runtime, and in most of these cases I've found the cost of doing so (in language complexity and inflexibility) far outweighs the benefit of preventing programming mistakes.

-----


I wish more languages treated null like objective-c. Where if a method is called on a null object, nothing happens and the method returns null. In user land a 'null bug' in objective-c often results in a feature not responding, as opposed to the entire app crashing. It's a much better user experience, and I can write code less paranoid about null values as well.

The car doesn't need to blow up if the air conditioning isn't working.

-----


Keeping the program running with corrupted data sounds very dangerous and brittle. Reminds me of php where passing the number 5 or the string "5" into a function returns the same thing whereas passing the number 0 or the string "0" can behave completely different. Not caring about what data you have and just trying to keep running is just sweeping the problems under the carpet and waiting for them to come bite you when you least expect it.

-----


Is it better for you, or for your user? Are you building a game, a social app, or space shuttle software? These questions should guide your decision in language requirements.

-----


You still have to worry about calling a method on a released object or a method it doesn't respond to. Making sure you set an object to null when you release it is good practice for that reason.

-----


True, I wish I could take Objective-C's handling on null and put it into C# or Java.

-----


Google provides Optional<T> as part of the Guava library, so this is a fairly well-known idea. I find it fairly tedious to use though.

The @NotNull annotation seems like it should be the answer, but I don't know of anyone that uses it. Would love to hear if anyone has figured out how to get value out of @NotNull!

-----


Haskell does this sort of thing in a slightly more elegant manner using the Maybe monad http://en.wikibooks.org/wiki/Haskell/Understanding_monads/Ma...

-----


Any language that allows null should also provide a non-nullable type. Kind of the opposites of C#'s ?-operator that turns non-nullable types into nullable. In an SQL table it's very natural to choose whether an item is allowed to be null or not, so why not in a programming language?

It still baffles me why C# and Java does not have this.

At least in C# you can use code contracts to constrain functions to not accept "possibly" null values. I haven't used this myself but the demos I've seen look impressive. Anyone got hand on experience of this or similar?

-----


Code contracts might be enforced using assert macros, I guess this would achieve more or less the same effect, right?

-----


No. assert() validates at run-time. Contracts at compile-time.

-----


> public Option<Integer> getIntFromHeader(Option<Request> request, String headerName) {

> We’ve now, strictly speaking, solved our problem

No, now you have two problems. What if request is null?

-----


Was recently working with a mainframe code for an update operation and my code was sending "NULL" for the last name. Surprisingly there was a person with NULL as last name (seems it's a valid last name, search facebook or LinkedIn and you will find tons of them), had to pull my hair when I saw that all the update transactions are being applied with that particular person's account.

TLDR: Careful, there could be some one with last name or first name = NULL"

-----


Google Cache: http://webcache.googleusercontent.com/search?q=cache:6CnwYpP...

-----


Banning null doesn't seem to have stopped their site from falling over.

There doesn't seem to be a google cache available.

-----


Title really needs to be postfixed with "in Scala".

-----


That would be a little silly, given how they're doing it in Java. Anyway, the paradigm translates easily to any OO language.

-----


Reading to the bottom and then viewing the subsequent post makes it pretty clear they are not doing it in Java; hence the need to change the title.

-----


The last paragraph says it's tough in Java and Scala makes this pleasant and they will explore that in the next post.

-----


Your codebase is written in COBOL?

-----


"Error establishing a database connection"

Made me laugh.

-----




Applications are open for YC Summer 2016

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: