Hacker News new | comments | show | ask | jobs | submit login
Value Type Hygiene (java.net)
103 points by kjeetgill 7 months ago | hide | past | web | favorite | 57 comments

What a fascinating design constraint: to retrofit a useful feature into the Java ecosystem while keeping as many things backwards-compatible as possible. This proposal has a much stronger answer to backward compatibility than the previous one, by letting all ported builtin classes keep the same descriptor in the bytecode. The catch is that each class file will now declare which of the types it imports are ValueTypes, and which aren't.

Java's strong commitment to bytecode-level backwards-compability is sometimes to the detriment of its feature-set (e.g. non-reifiable generic types due to type erasure), but this commitment is a large contributor to the platform's success.

Non-reified generics is the feature that allows different JVM languages to interoperate much more easily and share data structures. If you look at the CLR, interop there is much messier, largely due to this. The reason is that if reference-type generics were reified, then the JVM must know the subtyping relationship between such different types, and different languages have very different variance strategies. This limitation does not apply to types that cannot be subtyped, which is why value type generics can (and will) be reified.

In exchange, you pay with a minor nuisance: mostly the inability to overload methods with the same erased types. I think most would agree that having a truly polyglot VM that allows different languages to enjoy the data structures written in other languages is much more important.

It’s an interesting trade-off. It makes java a somewhat weaker language but the jvm a better platform. At the time the decision was made, probably no one was thinking of the jvm as polyglot platform. So there was a decision whose chief benefit was only revealed in retrospect.

It's also possibly an unnecessary trade-off.

Scala, for example, has something akin to reified generics, without any support from the JVM itself. It also allows for a decent level of interop with other JVM languages, though of course all you can really "export" to other languages is the least common denominator semantics required by the JVM.

Java could theoretically do something similar: Maintain type information that is visible to Java at compile- and run-time, while still compiling to code that is consumable from other languages without too much hassle. Really, personally, I only care about it being available at compile time. Haskell, for example, erases types much more aggressively than Java does, while also maintaining a high level of type safety.

I guess I don't know that it's backwards compatibility on the JVM that bothers me so much as Java's tendency to think of itself as being little more than a JVM macro assembler with C-like syntax.

That's not so simple. The class file, I believe, does have the generic instance at the use site, but then if you could have two overloads, foo(List<String>) and foo(List<Integer>), which would you call if you use a raw type (either in Java or from other languages). It's not that Java is a JVM macro assembler so much that there is not much to be gained from this feature given the rest of Java, and Java has a very different design philosophy from Scala, as it aims at a different audience. If and Java evolves in that direction, perhaps this feature would change. In the meantime, those who like Scala can use Scala.

This feels to me more like a failure of imagination than a technical impossibility. C#, for example, handles some things like this by exporting a function with a special name in the compiled modules. Java could do the same - handle the overload by having them be two different methods named something like foo__String() and foo__Integer() on the back end. The naming may not be beautiful, but I think that, if you do it right (for starters, try to use more readable naming conventions than Scala does), it can make for a pretty sane user experience for consumers who are using different languages. Naturally, the best practice would also be to design so that the API remains clean for packages that are meant to be consumed by multiple JVM languages.

I also want to point out that having a type system with a stronger handle on generics is about more than just ergonomics when overloading methods. As another example, Java's story for auto-mapping libraries (serialization/deserialization, lighweight ORM, stuff like that) is materially hindered by the inability to express things like "List<MyDataType>.class". And those sorts of things do have a material impact on developer productivity - when I'm working in Java, I find that I spend an inordinate amount of time writing, maintaining and code reviewing all the extra glue code that needs to be written to handle tasks that other languages make almost effortless.

> This feels to me more like a failure of imagination than a technical impossibility.

It's not the lack of possible solutions, but more of a design philosophy.

> is materially hindered by the inability to express things like "List<MyDataType>.class

See Guava's TypeToken https://google.github.io/guava/releases/25.0-jre/api/docs/co...

> I find that I spend an inordinate amount of time writing, maintaining and code reviewing all the extra glue code that needs to be written to handle tasks that other languages make almost effortless.

Which is why the Java ecosystem offers a nice selection of languages to match almost everyone's taste. Many people like Java and feel productive using it, and those who don't can enjoy the great polyglot interop afforded by, among other things, generic type erasure.

I don’t really see any interoperability difficulty at all between C# and F# on the CLR, can you please be more specific? As far as I remember instead interoperability between Java and Scala was quite a pain because of the different data structures.

That's because both were designed to specifically work with how the platform is designed. But compare Python/C# and, say Python/Java or Clojure/Java interop and code sharing.

Ok, then are you saying that the problem is that the authors of iron Python didn’t spend as much time and effort to integrate it in the .NET platform compared to the work that has been done with OCAML? From what I know it was actually Don Syme, the creator of F#, to bring non-erased generics in .NET. And if I remember correctly without his work on the generics it would have been more painful to have a nice interop between C# and F#. This seems quite at odds with your assertion that on the contrary erased generics make easier the interop between different languages on the same JVM/CLR.

All I can say is -- it's complicated. When languages share much in common, then factoring stuff into the runtime can make things easier, but when they're very different, the more language-level assumptions you bake into the runtime the harder it is to share code. You want to strike a good balance between having a rich runtime that makes it easy to implement languages and not baking too much language semantics into the runtime, to make it easier to share code. In retrospect, I think that the tradeoff of not having generics reified at the JVM, causing a minor nuisance in Java but making code sharing and interop easier across a wide range of languages is a net gain.

I still can’t see any proof in your posts that using type erasure for generics makes easier to have interop between languages. And I can’t think about any single reason why that should be the case. Python doesn’t have at all the concept of generics. Why it should be any more difficult to implement interop between Python and a language with proper generics compared to a language with erased generics? For Python it will always be an object, and getting the result in Java or C# you will still need to cast it properly.

Here's an example: A Clojure list is a Java list, but Clojure is untyped. However, if you create a list of strings in Clojure, you can call any Java method that takes List<String> even though Clojure has no such concept. AFAIK, on the CLR, Python requires special treatment to call such a method. Also, Kotlin's and Java's variance strategy is very different. Nevertheless, Java's generic data structures are reused in Kotlin.

Mmm.. ok, in iron Python you obviously need to specify the generic type that you are passing because it is needed to resolve the correct method, but in Java you wouldn’t. For me it’s a perfectly acceptable compromise if I can specify methods with the same signature and you can avoid passing around Class references to infer the type.

Why do you need to pass around class references? That's exactly what I was saying about doing something wrong. In over 15 years of programming in Java heavily, I don't think I had to do that more than a couple of times.

Adding to the examples, Scala has declaration-site variance while Java has use-site variance. Erased generics allow the two to work together.

It's worth noting that Don Syme also brought type erasure to the .NET platform via type providers, because provided types can be erased (and typically are, I believe). But, F# also lets the author of the type provider decide which class will be used to represent the erased type. So, unlike the JVM, and F# programmer can erase to something other than System.Object.

For what it's worth, Brian Goetz doesn't see type-erasure as a such a failure and defends the design choice[0].


I'd say the mainstream programming language theory view is that type erasure is the correct approach.

See https://en.wikipedia.org/wiki/Parametricity

Just a nitpick: programming language theory makes absolutely no judgment on what is a correct or incorrect approach [1]. It just studies the implications of various language designs. In this case it says something close to, if you have erasure then you have parametricity, and it also says what parametricity guarantees. It does not and cannot, however, claim that this is the "correct" way. In general, mathematical theories do not have an intrinsic notion of empirical value.

[1]: Although some PL theorists may, but this is an expression of their opinion, not something that comes out of the theory.

Pardon my ignorance but I don’t see how the Wikipedia link supports your statement. Can you please elaborate?

"parametrically polymorphic" == has a generic type. So a function

    foo[A](x: A) = ??? // Implementation hidden for now
is parametrically polymorphic, where A is the generic type (or type variable).

The principle of parametricity says that `foo` should act in the same way no matter what type of value it is called with. Type erasure ensures this, because it prevents `foo` from inspecting the (reified) type at runtime. If types are not erased then `foo` can behave differently when given, say, a `String` or an `Int`.

Why is parametricity important? It makes code easier to reason about. Parametricity means no special cases to remember. It makes everything more uniform and simpler. In fact there is only one valid implementation of `foo` above, and this is the identity function.

    foo[A](x: A) = x
Looking just at the type signature of `foo` tells you everything there is to know about it---which is quite remarkable! This reasoning principle is sometimes called "free theorems".

> If types are not erased then `foo` can behave differently when given, say, a `String` or an `Int`

Perhaps worth pointing out that this doesn't apply to languages like Java and C# though (which I think is the most common comparison, since generic support is quite similar between the two, but Java has type erasure and C# doesn't) as both languages support `instanceof` and `getClass`, which would allow a function to change its behaviour based on the type of its arguments.

It seems that once the type of an object can be determined at runtime, the argument that it is beneficial to hide the type of a generic parameter is quite a bit weaker.

It also removes some useful features, eg, without type erasure, this is possible:

including compile-time checks that the collection can contain elements of type Foo.

Yeah, it's an interesting design decision in the context of Java. There are certainly lots of Java developers who think erasure is a mistake. I think things like `instanceof` are the mistake, and reifying generics is piling mistakes on top of mistakes, but then I've been hopelessly corrupted by FP. I freely admit this isn't a useful stance to take if one is focused on evolving Java as it currently exists! :-)

My above post was mostly about explaining the concept of parametricity, though, and not so much about justifying it.

Yes, I have to agree about `instanceof` and friends being tricky. Often they seem to come up as a quick fix for some larger architectural problem, resulting in long-term debt.

The idea of inferring the set of possible implementations of a function from its signature is neat.

I view non-reified types as a mistake in Java, because it's easy to work-around:

<T> void doStuff(Class<T> type)

There, now you are guaranteed by the compiler to have the type of `T` at runtime. With this simple a work-around, it seems silly that they just didn't properly reify them and call it a day. That being said, it's also such an easy work-around that it's simply annoying at this point when I need to include the extra parameter(s) to get the type.

This is an argument against specialization, not against reification. With type erasure your compiler can't perform type-specific optimizations, at least not without a JIT and borrowing some optimization techniques from dynamically typed languages.

I don't think we agree on definitions of terms here. For me type erasure means that the programmer cannot access a runtime representation of a type. A "runtime representation" is what I mean by "reified", as from the programming language theory perspective types only exist at compile time (and it is values that exist at runtime).

Type erasure, as I use the term, is compatible with type specific optimisation. For example, the MLton compiler for Standard ML supports monomorphisation of polymorphic code, which in turn allows unboxing: http://mlton.org/Monomorphise The compiler is free to use whatever representation it likes, which includes a reified representation for use in, say, dynamic linking, so long as it doesn't break the language semantics (by exposing this to the programmer, for instance).

Of course, guess I shouldn't comment before the caffeine kicks in... I was thinking without reified types the JIT can't optimize it easily then got stuck on that. Of course the compiler could do so, Java just doesn't have an optimizing compiler.

I believe that the class file does retain the specific generic instance at the use-site, it's just that there isn't much type-specific optimization that's applicable to reference types beyond what HotSpot does anyway (which is stronger than what would be available by the generic instance type information). It's possible that Java AOT compilers could make use of this information.

I see, but what if I want to write a generic function where these types are somewhat constrained? (eg: ordered, numeric, is functor) what technique should I use then?

The argument goes something like "Parametricity is a good thing and you can't have parametricity without erasure".

I can't wait for Value Types to land! The contortions my code has to go through to fit in RAM and avoid fighting the garbage collector drives me nuts. Problems that might fit in a few megabytes in, say, C/C++ can take gigabytes in JVM. I have always looked at the C# structs with envy.

Rah rah I hope the authors peek on HN and see that their work has baying fans :)

I help run OpenJDK advocacy - and yeah we notice . The authors aren’t always able to jump in on threads but we collate feedback for them

C# has much more than only structs.

You can do manual memory management, GC free code blocks and there is unsafe.

Additionally C# 7.x has safe stack allocation, spans, references to stack allocated data.

Looking forward to have some of those goodies in Java as well.

> Additionally C# 7.x has safe stack allocation, spans, references to stack allocated data.

How much of those are the direct result of having value types in the language?

As far as the other features you mentioned, it would be nice to have them in the JVM as well.

All of them.

They were already available in other languages like Modula-3 and Oberon(-2) before Java came to be, but Java team decided it was too complicated for their target audience.

Just like Sun was religiously against AOT, a feature only available in commercial third party JDKs, they also believed it was a matter of making the JIT good enough for escape analysis.

Thankfully the .NET team had other beliefs.

Value types are stack allocated types (unless they are a member of a reference type). You might be referring to stack-only value types, which are prevented from being copied to the heap.

I am referring to array values.

Up to 7.3 you could only stack allocate arrays (alloca() style) inside unsafe code blocks.

Starting with 7.3 you can safely stack allocate array of value types.

What data structure design are you using that takes requirements from megabytes to gigabytes?

Silly quick question: What is the "L-world" discussed here?

The letter L is used in the JVM bytecode specification to refer to objects (as opposed to primitives or arrays). You'll often see that notation when using jmap or javap.


Thanks! I'd seen that notation before, but I didn't connect it to this context.

I think this has to do with the class file representation of value types, where the l world is more similar somehow to the current state whereas the q world introduces major differences.

Presumably I'm missing something, but are they trying to do something more than just Boxing value types? (Specialization?) I cannot see how legacy classes would be impacted if Boxing was used, nor quite why it is going to be so difficult. Boxing would seem to deliver most of the benefits (i.e. the ability to do things in a memory efficient manner in new code).

Yes, the point is to not box them.

Yes, in ArrayLists ... etc., but if you want compact values on the stack, or compact value data structures you don't need this. I feel like 80% of the solution (and one which is likely future compatible) is going begging whilst they worry about nice to haves.

Happy to see value types making their way, however I'd put good money on not seeing this in Android until 2030 at the earliest.

In the Android world, it's quickly becoming what Kotlin supports.

Still need JVM support of value types for this to make sense.

Android doesn't use the JVM. Hell, it doesn't even really use a VM any more.

Sure it does, ART was AOT only during Android 5 and 6.

Android 7 brought the VM back.

Now it uses an interpreter hand written in Assembly that gathers PGO info for the JIT, which also performs PGO data gathering, then only the hot bits are AOT compiled to native code, when the device is idle.

Any change on the execution path or updates trigger the execution process from the beginning.

I didn't know that. Here are the docs: https://source.android.com/devices/tech/dalvik/jit-compiler

The docs are outdated, Android P is bringing sharing of PGO data across devices via the Play Store.


I'm going to stand by my statement of "not really a vm." Calling it a plain vm when the goal is to run as much AOT code as possible is too misleading.

I call it whatever the CS theory of programming language says.

And from that point of view, ART on Android 5/6 was a runtime, turned into a VM on Android 7.

From your point of view, the CLR is also not a VM.

The Android world only has to loose by dropping compatibility with Java libraries written in Java 9 and newer versions.

Or do you think now everyone is going to rewrite their Java libraries into Kotlin, using JVM 8 only features?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact