Hacker News new | past | comments | ask | show | jobs | submit login
Nullness markers to enable flattening (openjdk.org)
69 points by kaba0 on Feb 7, 2023 | hide | past | favorite | 87 comments



I worked on JVMs (in a mostly research capacity) for a good number of years, but I soon gave up on the idea that classes and instances of classes (thus, heap-allocated objects) are the only thing the VM and language should be designed around. But there is a lot of sunk cost in the engineering of the JVM and there seems to be a lot of inertia against adding new primitives like tuples or unboxed types. That limits the source language design somewhat and definitely brings down the performance ceiling, raising the complexity of the engine, ironically (because it has to work extra hard to maintain the fiction of objects with identity).

Well anyway, I diverged from that significantly now; Virgil has value types, tuples, ADTs, and closures and happily unboxes them all.


Clearly we're adding a new kind of primitive to the language and VM. I don't see how whether the Java language's tuples are nominal or structural (there are pros and cons to either approach, but Java added nominal tuples just as it did with its lambdas) has any relation to performance or complexity at all.


The biggest issue with the ongoing Valhala hurdles seems to be related to the goal of having old JARs still running on a Valhala world without recompiling.

By now I also have my doubts that it will ever come, they should have had better inspiration from Modula-3/Eiffel/Oberon when Java was originally designed.


Imo named tuples via records are better than tuples provided at the primitive level. They provide a ton more context for the next reader about what this thing actually is.


> They provide a ton more context for the next reader about what this thing actually is.

Right until there’s no more context than “this returns 2 elements” and you end up with 15 different Pair types and more verbosity for no value.

Named tuples are not better than tuples, they’re complementary, both are useful tools.


There's definitely always more context than "this returns 2 elements". What are they? What are you storing them in? What did they belong to?


I'm trying to let programmers make the call here, which is why Virgil gives you both options. In causal hacking, anonymous tuples can be very useful. I find that very true in test code, where I have a tendency to throw more bells and whistles at it to get meat of the tests really short (so I can write a lot of them). Requiring elaborate ceremonies (such defining a bunch of tuple types for no good reason) is just a waste of time.


> There's definitely always more context than "this returns 2 elements".

Not usefully so. If you’re splitting a collection in two, whatever context you add is almost certainly worthless overhead not worth the extra type.


In Virgil, you can have it either way, anonymous tuple types with members indexed with .N, or flattened data types (without identity) with field names:

  var p: (int, int) = (0, 0);
  var q = p.0 + p.1;

  type Point(x: int, y: int) #unboxed { }
  var p = Point(0, 0);
  var q = p.x + p.y;
Both will generate identical machine code with no allocations. It requires no escape analysis or assumption of non-identity (both tuples and data types have structural equality).


It'd be nice if you could inline the Tuple description into the method signature.

There's a bit of unnecessary overhead in having to externally declare and import the return type.

   public [K key, V value] getElement(int index);
This really comes down to the dynamic vs typed argument, and to me the answer is "minimise the boiler-plate, maximise the guarantees.".


I think declaration vs use Sue distinction is much more helpful to reason about in this context, and named tuples are a classic example of “if you didn’t declare these tuples to be compatible at definition time, it’s probably a bug to let you pass them around as if they were”


Codesmell sure - and a case for refactoring, but I wouldn't go as far as to say bug.

But I wouldn't expect that to be the common use case - I'd generally expect that the caller would just be implicitly importing and using the type effectively declared in the method header.


What's the technique to unbox closures and ADTs?


You gotta use multiple (machine) words to represent them. The ADT can't be recursive and the closure environment needs to be small enough. E.g. for delegates, the closure environment is just the receiver object, so using a fat pointer of two words (one code pointer, one object pointer) is sufficient.


Ah I see, so you represent all closures as two machine words, in a representation that the GC can understand?


Yes, all closures are represented as a reference word (traced by the GC) and a non-reference word[1], which is a pointer to the actual function entry point (i.e. code). For delegates (i.e. methods bound to an object), the reference is just the object, and the function pointer is the actual code (looked up through the v-table once). So that actually means a call to a first-class function is cheaper than a virtual call--it is literally an indirect call to the code pointer, passing the receiver object as the first argument.

[1] The two words even become separate, independent scalar values in the compiler IR. Thus the runtime support is minimal, in that it doesn't know anything about multiple-word values: the GC just needs to know whether a word is a reference or not.


If I'm understanding this correctly, is the author saying that a future version of Java may let me use `String! foo` and the compiler will make sure this variable cannot be null?

That would be very helpful indeed.


Explicit nullability is one of my favorite features of kotlin. Kotlin basically treats nulls like a checked exception: deal with a null value immediately or incur the mental overhead of having your data treated like a 2nd-class citizen in all the code it touches.


Having briefly dabbled in Kotkin before returning to C#, this is one thing I really miss. C# has added non-null reference types but there's a number of places I really wish it acted differently.


It baffles me that golang does not have this, which is otherwise a relatively modern language.


If Go were a modern language, it would've had algebraic data types and pattern matching since the beginning, and its handling of errors and enumerations wouldn't be quite so embarrassingly bad.


Go is not a modern language. It's an 80s language that happened to be made in the 2010s


There's a lot of stuff golang doesn't have :(


...to go faster.


Go is Limbo repacked in new clothes with Oberon-2 method syntax and unsafe package, hardly modern.

Even Limbo had better support for plugins.


I remember reading about this project first in 2014-ish, how long has this been going? Seems like this is a ton of work, only to be able to encode constraints to maintain correctness and performance like you can in a memory unsafe language. The latter has always allowed to "postulate" dropping identity by value copy and to encode 1 to 1 relationships by direct embedding.

I think I've been corrected in the past, where someone pointed out that managed languages don't have to be like that? What are existing approaches that allow for better control than traditional Java offers while retaining a convenient syntax?

Concerning the languages I know -- memory managed, reference only models always come with the possibility of indirection at every turn due to not having the distinction between dereference and member access (not in the syntax and not in the semantics). It's very hard to infer from the code if a dot operation is an injection, mathematically speaking. I believe that is the reason why I've always felt "safer" and more precise in memory unsafe languages.

I also think that some of the newer languages are misguided to hide the distinction by using dot syntax for both.


> come with the possibility of indirection at every turn due to not having the distinction between dereference and member access

Most managed languages don’t expose the location of an object, only its semantics. With optimizations (escape analysis), an object can be allocated on the stack and thus only effectively make member accesses. This optimization can be done much more deterministically with value types — if an object looses identity then the runtime/compile time doesn’t have to reason about whether it leaves the function/loop, it is free to copy it in/out of functions as often as needed.

As for preferences, it makes sense in low-level languages, but imo hiding it is the correct choice on a higher level — this is a core benefit of GCs, e.g. your public APIs will be much more maintainable/won’t need refactors because they don’t expose low-level implementation details (where should that argument be stored), only its semantics.


Note that using dot syntax across records and pointers is as old as Mesa and Ada.


Each time I read one of posts about JVM, I'm impressed by how they manage to deconstruct a "high-level" concept such as value type or reference type to more basic ingredients. Pinnacle of bottom-up design.


It's in correct direction to make nullness more explicit through the type system.


I despaired of the Java process' ability to fix this kind of issue when I saw what a hash they made of the Optional implementation (not allowing Optional to contain null undermines most of its advantages, and destroys all incentive for migrating to Optional if your codebase isn't going to reach 100% null-free in the forseeable future; not to mention the (explicitly stated!) desire to "not be a monad" i.e. to have wilfully idiosyncronatic behaviour).

Trying to make this a not-type is doomed to failure. It's a property of expressions, it has rules about how it flows through expressions; that means it's a type, or it's an ad-hoc informally specified implementation of half a type. It's recapitulating the mistake of checked exceptions all over again.


Could you expand on the “not a monad” part? Java’s type system with generics is surprisingly expressive, but it can’t have higher kinded types, so a real Monad interface is not possible as far as I know. But it does have methods to use it as you would use a Maybe in Haskell, so I don’t find its occasional use problematics at all.

As for checked exceptions, I disagree that is a mistake. They correspond one-to-one with Result types, but are just better integrated in the platform (has proper stack traces, bubble up as default, and auto-unwrap). Java’s implementation has much to be desired (it didn’t have sum types back then), but checked exceptions themselves should get a reevaluation.


> Could you expand on the “not a monad” part? Java’s type system with generics is surprisingly expressive, but it can’t have higher kinded types, so a real Monad interface is not possible as far as I know. But it does have methods to use it as you would use a Maybe in Haskell

People in the design mailing list discussion argued that those methods shouldn't be added because they didn't want Optional to be a monad.

> They correspond one-to-one with Result types

They don't, because throwing them is not a value (there is no way to have a variable that you can substitute for "throw e" and get the same behaviour), and because they're not properly integrated with the language's type system. Throws-ness is a separate parallel universe that behaves almost but not quite the same as expression type. Which is exactly what this is proposing to do with nullness.


The way Optional is/could at the time be implemented is far from performant, so they probably just didn’t want to overrely on such an abstraction. Even intellij’s default linter will call out on you if you use it as a class field for example.

While I do like more pure FP languages very much, I don’t think there is all that much value in everything being a value besides some mathematical elegance. Besides, a throwing exception can be turned into a value via a try-catch block, and can be just as easily tested as if it were a value, so where is the downside?

Exceptions are for exceptional situations, result types can still be used for very much expected ones. E.g. a primitive file read operation returning either a byte or EOF is great as a result type. Some specific IOException is better modeled as an exception which you might not be interested in handling at that level.

With all that said, I’m interested in the newer generation of languages with effect types that might bridge this gap.


> While I do like more pure FP languages very much, I don’t think there is all that much value in everything being a value besides some mathematical elegance. Besides, a throwing exception can be turned into a value via a try-catch block, and can be just as easily tested as if it were a value, so where is the downside?

It's really cumbersome in practice. E.g. when processing results from a database query (say), you probably want to call some callback for every row and then close the transaction and return the result of that callback. But handling exceptions around that becomes a significant pain. Once you've used a language with proper result types you don't want to go back, IME.


If the exception is something you can handle at that function’s scope, just put it inside a try-catch. Otherwise, let it bubble up and let it close the transaction (or however you want to handle that). If anything, I feel it is easier to do with Java than in, say, Haskell though I will add that my experience with Haskell doing db queries is limited.


Usually what you want is to go through the rest of the rows, finish the batch and close the transaction, and then handle all the errored rows at a higher level. "Bubble up by default" works in simple cases but falls apart as soon as you're doing any remotely high-throughput work (which has to mean chunk-at-a-time processing decoupled from the orchestration), and ends up doing more harm than good, IME. And note that in Java it's outright impossible to catch exceptions from a generic expression while preserving their type - you have to copy/paste your functions for each number of possible exception types you want to be able to handle.


I don’t see why this wouldn’t be performant:

  void transactionBoundary() {
    var list = someLongListOfElems;
    var resultList = List.of(); // a sum type for success and error conditions
    for (var l : list) {
      try {
        resultList.add(process(l));
      } catch (SpecificException e) {
        // add to resulList or errorList or whatever to further process those instances. 
        // You can just use SpecificException’s fields
        …
      }
    }
  }


That's fine. But you have to manually write those ten lines everywhere instead of just doing processInTransaction(myCallback).


I'm sure you know, but List.of is immutable, you can't add to it here.


> not allowing Optional to contain null undermines most of its advantages

This isn't the first time I've heard this and I'm still not certain I understand.

To me, Optionals are an alternative way to say "no value" over the existing way - null. I like Java's Optional because it means I can stop worrying about null. An Optional that wraps a null doesn't give me that. I still need to do null checks!

Why would I want that?


One advantage of Optional is allowing you to remove null from the language and then stop worrying about null checks. But you can't get that in Java, because you can't remove null from the language. So in practice that benefit is very small - you still have to worry about null in Java code, and will for decades.

The other advantage of Optional is that it behaves consistently regardless of what's inside it, which means you can use it in generic code in a way that you can't use null. For example a common problem with null in Java is that if you call map.get(key) and the result is null, you can't tell whether that means the key isn't present in the map, or the key is present but the value is null. Whereas if you're using optionals, you can distinguish between None (key not present) and Some(None) (key is present, but mapped to None).

So Java could have added a getOptional method to Map (with a default impl), and then if getOptional(key) returns None the key isn't present, if it returns Some(null) then the key is present but mapped to null. That would be a legitimate, immediate use case, that would give people a benefit from switching to Optional today, and would start the ball rolling on using optionals, and maybe eventually in a few decades they would be widespread enough that we could start deprecating null in the language.

But instead they made it so you can't put null in Optional from day 1. Which would be great if there was a way to instantly migrate everything, but it makes it harder to start using Optional in an existing codebase and reduces the short-term benefits, for the sake of a future that (because of that very lack of short-term benefits) will never come.


You can't add nullness to types because that would break generics.

List<String!> will still have nullable methods and fields.


List<String> can contain nulls. List<String!> would be a list all of whose elements are non-null. There's no problem. Other languages have been doing this for decades.


The end goal would be to have proper specialization, until then it can be simply runtime enforced.


I don't care how they get there, as long as they get there. In this case they're realizing that NULL makes optimization difficult and are swerving into solving this 27 year old instance of the "billion-dollar mistake." NPEs shouldn't exist in work-a-day code.


I've alway felt "billion dollar mistake" is unreasonably hyperbolic.

In retrospect it would be very nice to have a language level ability to define something as optional, and require existence checking for such values;

But it wasn't so obvious at the time.


I mean, it doesn't take much for a class of errors that has existed for 55 years to cause a billion dollars in damage. Just in US labor costs, that's less than 5 hours per developer year. And I probably spend about 10 hours a year on average dealing with NPEs.


Sure, but most people see $1,000,000,000 and think only of the incomprehensible abstract - they see huge; they don't see what the value means.

They certainly don't see the amortised cost across the decades long operation of a highly successful industry.

So it comes across as a far deeper cut than the stubbed toe that it actually is.


Convenient hand-wave to distract from the question of wether or not it has been a mistake at all: "look, reverse-extrapolated to the individual man-year it's such a small number! Who cares about the sign!"

That hour you spent chasing an NPE? Yeah, it happened. (really not that often though) The time saved by having a single, universally accepted way of implementing optionality, even it's inconveniently always on? That's invisible because we did not live through the alternative.


I believe ML family of languages do not have null. And it was developed well before Java. So I don't think timing was the issue.


Eiffel also didn't, and additionally it supports value types (expanded classes).


> I've alway felt "billion dollar mistake" is unreasonably hyperbolic.

I guess you can take that up with Turning Award winner C. A. R. Hoare FRS FREng. I'm going to go ahead and keep citing it despite being a epic underestimate.

> But it wasn't so obvious at the time.

It was, just not to the hackers that invented the languages plagued by these mistakes.


> But it wasn't so obvious at the time.

Sure it was. That was how it worked before. Hoare created null because of pressure from other programmers to write shitty code that looked like it worked even when it didn't.


Can you expand on that a bit? Why would having it built in to the language (Kotlin-style, for example) be better than not having the concept of null in the language at all and using something like the Rust Option?

Personally, I like the Rust approach. It seems a bit more flexible and can evolve easier over time.


I think the main issue with the Rust option is they ended up with language syntactic sugar anyways. So in some ways, it's worse than the kotlin-style because reading `let baz = foo?.bar` could have all sorts of weird type implications.

What is baz? Is it Option<Bar>? Is it Result<Bar>? Is it some user defined enum<Bar>? Who knows! You have to find that by looking up the foo definition.

For kotlin, the answer is simple. "baz is a nullable Bar"

Rust did this because interacting with the enum directly was cumbersome.

The concept of "no value" is so integral to day to day programming that elevating it into the type system with "nullable types" makes more sense to me vs using generics trickery. Even if it's slightly less "pure".


> So in some ways, it's worse than the kotlin-style because reading `let baz = foo?.bar` could have all sorts of weird type implications.

Syntax sugar that works with a normal library type is much nicer than dedicated syntax for a special-case builtin, IME.

> What is baz? Is it Option<Bar>? Is it Result<Bar>? Is it some user defined enum<Bar>? Who knows! You have to find that by looking up the foo definition.

Sure, or if you have a decent IDE you just mouseover it. But that's no different than any other method. `let baz = foo.add(bar)` doesn't tell you what type foo or bar is, and I don't think I've ever seen anyone argue that it should (e.g. by requiring method names to be globally unique).

> The concept of "no value" is so integral to day to day programming that elevating it into the type system with "nullable types" makes more sense to me vs using generics trickery. Even if it's slightly less "pure".

I've found this isn't really true. Once you don't have language-level support nudging you to use it all the time, wanting to have a possibly-absent value is actually pretty rare. (E.g. a lot of the time you want to include a "reason" for why it's absent, so you want an Either/Result-like type - but if you're using Kotlin you end up using a nullable type because you're lazy and the language makes that easier. That's bad for long-term maintainability IME, especially because if you want to switch the nullable type for a result you have to change all your code - unlike Rust where you can switch fairly easily because ?. works the same way for both types).


I rarely if ever use the question mark with 'Option' when working with rust, and it's been just fine.

I do use it a lot with Result though. But if you have any question about the return type you could[1] just look at the return type for the current function you're in.

[1] I think there's a way to define custom types on nightly, and I wouldn't be surprised if it let you map custom types on Result/Option, but I don't actually know.


I am a fan of Rust, but I feel obligated to point out the return type of the current function does not provide any guarantees about the type which '?' was used on, even on stable. Since the generic requirement to use '?' with a return type of `Result<T,E>` is just `Into<E>`, you would still need to look at the called function since `From<E> for T` could be satisfied for (almost) any T.


Ah, I did not realize that. Thanks for pointing that out!


Is there a difference between excluding null from a type and having basically syntactic sugar for ‘T | Null’ in the form of ‘T?’ ?


Yes. Having an explicit `Option<T>` type allows you to write things like `Option<Option<int>>`. This type is not possible in, say, Kotlin or Typescript. (Well, technically it is, but not using the native null types provided by the language.)

Most of the time this isn't a very useful type to have, but it has the big advantage of being mechanically obvious. For example, consider a hashmap of optional types, something like `Map<String, Option<Player>>`. If we write a get method for the map, it should return an Option to indicate whether the value was present or not. But what happens if the value is present, but it is explicitly Null?

The advantage of having an explicit option type is basically that it's easier to compose generics without having to understand what the values might be, which is usually what you want when using generics. That said, most of the time, if you've got an option of options of something, you're just going to flatten that type down anyway.

E: Thinking about it, the other advantage is that it's easier to create a separate namespace for "methods that should exist when T might be null but are meaningless the rest of the time". For example, Rust's `Option` type has methods to map the internal value into another value, unwrap the value and panic if it was null, swap the value with a different one, etc. In languages which use null | T, the equivalent is usually to use Elvis operators and similar (obj?.field), but that requires more special casing.


I guess one could have ‘(T | None) | None’, which can automatically be flattened to ‘T | None’, couldn’t it?

Also, the Map::get method that can return null is the problem here (even though your example is great and thanks for that, I didn’t think of it), something like ‘Option<Player?>’ should allow to differentiate between those cases.

But I do remember reading that a bottom type does make typing rules harder (e.g. scala 3’s explicit nulls feature which is basically “T is non-nullable, write it as T | Null” is not sound)


The other commenter related the types to normal binary operations, which I think explains why (T | Null) | Null can't be distinguished from T | (Null | Null). Another way of thinking about it is by asking what "untagged unions" actually means, and in the context of languages like Typescript, it generally means "unions where the tag is the type of the object". ("Untagged" here is a bit of a misnomer.) But if the tag is the type of the object, how do I distinguish between two different nulls? They both have the same type (the Null-Type), which means they both have the same tag in the union, which means they're the same.

What you say about `Option<Player?>` is a good point though. If we want to distinguish between the different results, we need an explicit Option type. But now we've got Option _and_ we've got nullable types. Which should we use? They're both doing the same thing (i.e. marking where a type may be present but might not be), so why do we need both?

In practice, my impression that nullable types are really good for integrating with languages that already have unchecked nulls in then, either for historical reasons (like Java) or because they're dynamic languages (like Python or Javascript). It's a way of acknowledging the null value in the type system without demanding that all the code that been interacting with nulls be rewritten.

However, if you were going to write your own language from scratch, it's difficult to see why you would allow nullables to exist when the Option type does pretty much everything that nullables can, but with more clarity for cases of "nested nullability". You can also still add syntax sugar for it (Rust is going down this route, for example), but you don't have to special case nullability to the same extent.


Thanks for the answer, I was genuinely curious about the advantages/disadvantages of these approaches. But indeed, types that completely exclude ‘nulls’ seem to be the cleaner solution — though I wonder whether that is even possible to retrofit to Java.


No, you cannot.

var foo: (T | None) | None = None

Which None do you mean here?

This is just basic logic: (X or Y) or Y <=> X or Y

Untagged unions cannot model that. Contrast it with tagged unions:

var foo: Option<Option<T>> = Some(None)

var bar: Option<Option<T>> = None

Now you can reason about where the None means.


I may have not been clear, but that “flattening” is what I meant, and mentioned it as a “feature” (when Some(None) and None is equally useful/useless).

Surely, if you want to distinguish between 3 states you need more data, hence my recommendation of Option<T?>


I really don't think that you want null in a modern language, at the runtime level there will eventually be a zeroed out reference field - but a programmer shouldn't have to deal with that.

So I think we largely agree.

(I left example syntax in another comment.)


Having nil (NULL, undefined whatever) that is falsey as is customary in Lisp/ Clojure is quite useful - the language family depends on it. Most things deal with it very well and it is a useful distinction that something wasn't setup or didn't return any meaningful value for whatever reason.

Clojure has a powerful meta-data system which could be used for checking for errors if Clojure didn't depend on the host platform for that. Also, it seems, Common Lisp also has some ideas about how to handle things more or less in-line with other code without having a special system that you can't see most of the time.

Btw. some of this can go very deep in the technology stack, there were computers (https://en.wikipedia.org/wiki/Ternary_computer) with ternary logic (https://en.wikipedia.org/wiki/Three-valued_logic). Such a computer could improve many aspects of computing, e.g. density and therefore efficiency. It could improve reliability (things could be more explicit). You could divide by 3 precisely and efficiently. It could improve our understanding of logic by making this for most of us alien concept to a more practical and widespread tool.


What syntactic sugar they offer is not clear from the mail.

There are already lots of techniques, from annotations to Optional.ofNullable but none matches the simplicity of != null.

They can also go the route of `effectively final` and make the compiler deduce nullness instead of marking it explicitly.


It'll allow the JVM to more aggressively optimize the code (the main benefit).

Annotations are, unfortunately, all over the board. There's a dozen different `NotNull` annotations out there. Having it in the language will make it the last solution to this problem.

You can't do deduction at compile time because you'll end up with different method signatures if someone calls a method with a null vs not. (not to mention the problem with someone calling these methods with reflection)

You can do deduction at runtime (and they likely will), but then you have to deal with the problem of unwinding an optimization when the hypothesis is proven wrong. That can be really costly when the optimization is dealing with memory layout. (Imagine an array of value types where you deduced they should be non-null, yet some unexpected path inserts a null. In that case you have to reallocate the entire array and it's elements in order to handle that).

It's more than just syntactic sugar.


There's not really a nice way of implementing the syntax without changing the language.

Although perhaps they could resolve this with magic imports?

    import java.lang.magic.optional;

    public class Foo{
        public static void main(String...args){
            Foo? f = ...;
            if(f){
                f.doThings();
            }
            f.doThings(); //compile time error
        }
    }
The idea being that importing the magic class enables the cleaner syntax.


I doubt that it's possible to open the door for "language version by imports" without turning that decision into a forever-floodgate. Note that I am not entirely sure that it would be a bad decision (I do lean in that direction however), but it would definitely be a severe one.

It would be like a variant of inviting Dracula across the doorposts, only that this one may or may not actually be a cool guy and not a vampire, but in any case he will sure bring all his friends, and their friends and so on forever.


With great power comes great responsibility.


Discussions on the mailing list usually tries to avoid the exact syntax for as long as possible to avoid bike-shedding. But according to this mail there would be a default nullability (hopefully not-nullable as the default, but it may depend on whether it is a value class or not), and ! and ? as modifiers (and * for type variables).

So `Point p = null` won’t type check, you would have to mark it as `Point?` where Point is a value (primitive) class. There were also points that it should work with the usual type inference, e.g. `final var p = methodThatReturnsNullablePoint()` would be inferred to `Point?`.


Reference types probably can’t be made non-billable by default as that would break too much. But

  String! myString = “Definitely not null”;
doesn’t look too bad.

Edit: it’s worth saying that a lot of careful thought around compatibility goes into features like this. That means the core libraries should be able to transition to non-billable types where this is already a run time requirement, and existing compiled jars must continue to work.


I think it can work in a binary-additive way — the class file gets a new attribute for nullness information, while missing it (older class files) default to “unspecified”. That can be safely consumed by new code, which will get the updated syntax with a new option (specified on a module/package/class basis) where ‘T’ means ‘T!’ and that should be a proper subset of previously compiling code, so I don’t think anything would break that way. Similarly to how generics and unchecked warnings were rolled out.


I don’t think anyone would really use String! everywhere.


I think you’ll see it creep into parameter types in the core JDK and it will work its way out from there.


See the !! debacle for C#11 to see how changing the default went.

So now we are back at warnings if non nullable references are enabled on the compiler settings, and eventually we can turn them into warnings as errors.

The biggest issue is the large ecosystem of existing libraries.


I think the general idea is pretty clear.

`MyClass! myvar` is not supposed to accept nulls. Assigning null to `myvar` would throw exception, comparing `myvar` with null is pointless and probably should emit a warning. Or maybe even error.

`MyClass? myvar` is supposed to accept nulls. Dereferencing this value without checking for null is either warning or error.

`MyClass myvar` is supposed to be unspecified with regards to nulls. So basically current code. You can do whatever you want but you should refactor it to `MyClass!` or `MyClass?` some day.


Do NPEs exist in work-a-day code?

I remember working on a codebase where everything had null check pyramids like

  if (foo != null) {
    if (foo.getBar() != null) {
       Bar bar = foo.getBar();
       if (bar.getBaz() != null) {
          for (Quux q : bar.getBaz()) {
             if (q.getFoobar() != null) { ... }
          }
       }
    }
but that was a long time a go in a code base that operated by passing humongous mega-objects with hundreds of fields (running into the limit of how many parameters could be in the constructor was a constant headache), it solved a lot of its problems by printing a stacktrace and shrugging, and objects would be half-constructed and methods would return null in all sorts of scenarios.

In comparison, the entire codebase for marginalia search has about 65 nullchecks in total. A decent chunk of them are in dealing with older APIs that sometimes idiomatically use nulls to communicate things. java.util.Map, BufferedReader and the Servlet api in particular, the rest in dealing with the inherently noisy nature of crawl data. So while there are a few examples where there are nulls in the data, but they are almost always contained to the module that produced them and don't cross interface boundaries. It's very far removed from some pinnacle of code quality, but just gets a few engineering principles right that that other code base didn't.


Maybe this is from my epoch of education, but with early Java null checking everything was promoted as a good practice. Even today, depending on what your teacher said was correct, your coding experiences (e.g. if you came from C++, if you read 'Effective Java'), you might still null check everything.

Personally, I lean towards the "use nulls sparingly and be very explicit where null values are possible", use immutable objects etc. This reduces the unnecessary null-check noise and makes things more robust. In java, I even prefer to add @Nullable to fields to try to switch the default mindset to "assume non-null unless @Nullable annotation exists" - though another developer made a note on a code review that "that is not how Java is designed and @Nullable is just noise".

Having nullability as part of the type makes everything more explicit, so I'm a huge fan. Beyond just the perf and safety benefits, it's a win for domain modeling.


ML solved it in the 70's, Eiffel in the mid-90's, however just like NPE there were quite a few things that Java didn't adopt, and here we are.


> In this case they're realizing that NULL makes optimization difficult

How so? Null checks are mostly “free” on the JVM, they trap on the zero page and get converted to an exception via a signal handler.


It’s not only the checks, but allotted storage. An Integer that’s guaranteed not to be null can be stored in 4 bytes, a “Point” class with two “int” fields that is guaranteed to be not null can be loaded in a 64-bit register, etc. That makes it easier for the compiler to generate good code.


Eiffel was already doing this with expanded classes and non void references, but yeah, who would have thought such details are relevant. :)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: