Hacker News new | comments | show | ask | jobs | submit login

If they're doing breaking changes to the VM, I wonder if we can get rid of type erasure. Generics seem really nice until you actually use them to do anything mildly complicated, then type erasure rears it's ugly head and you now have a type parameter and a Thing.class normal parameter and reflection.

I only thought the erasure was bad because of how Java lacked value types (so my list of T will use heap boxes when T is int).

I haven't given it a ton of thought but I thought that apart from reflection it would actually be nicer if the runtime erased types. I.e. at runtime a type should just be either a stack/copy/move type or a reference type. So erase all reference type to objects.

It feels like it's the job of the compiler to handle these. Obviously at runtime you can get some extra safety guarantees from reified types, but at the cost of languages on the runtime being more difficult to adapt to other paradigms etc.

I imagine a runtime that doesn't have strong opinions about types to be easier to make type level language changes for (i.e. the language(s) can evolve without the need for runtime changes), and it should also be easier to support entirely different type systems.

Erased generics are awful compared to proper generics like C#. If you ever used both of them it should be clear as the sun. In C# you don't have to pass around the type in a method parameter and use reflection. I’m missing how you can claim that the languages on the runtime with proper generics are more difficult to adapt to other paradigms given the extremely fast evolution of C# and F# compared to Java that for example received closures something like 10 years after C#.

That's a relatively minor syntax issue, fixable with something like Kotlin.

All runtimes need to erase types at some level. Otherwise ArrayList<Foo> and ArrayList<Bar> would end up compiling identical versions if both Foo and Bar are reference-only types, which just wastes memory. At some level the compiler and runtime need to merge duplications - in C++ that feature is called COMDAT folding, or used to be.

.NET has had serious problems with code duplication in the past. Here's an excellent blog post by a Microsoft engineer on it:


Your comment is a bit misleading, because .NET has always only instantiated one version of a generic for all reference types. Even the article you posted backs this up, though I'm sure I've read the same on MSDN.

"... instantiations over reference types are shared among all reference type instantiations for that generic type/method, whereas instantiations over value types get their own full copy of the code."

Just about the only thing it could do better is to reuse the same instantiation for all value types of the same size.

.NET always preserves generic type information for reflection but there is code reuse when a generic type is used multiple times with different reference types, e.g. there is little overhead having List<object> and List<string>

Duplication exists with generics and value types, e.g. List<long> and List<DateTime> are entirely separate code. It's just a thing to keep in mind when mixing them with value types.

Don't Haskell and Scala also erase the types? If I remember correctly Martin Odersky even said erasure is better for some things in one of his videos/keynotes (?), but I'm not sure if I remember that correctly.

Haskell totally erased types and has a mechanism to recover that information as values. Both are way safer and easier to use than Java's versions.

Scala follows a similar route. It erases types, but lets you reify on demand in form of TypeTags.

Except not:

    x match {
      case _: List[Int] => 1
      case _: List[Char] => 2
      case _: String => 3
This used java's semi-erased class tags and doesn't report a TypeTag constraint in its type. It will also return 1 when x is of type List[Char] since partial erasure means that the first two patterns end up being identical. The compiler will warn you about this situation, but generally it shows up all the time for various reasons. Super bad news.

That all said, the TypeTag system could be very nice someday. Especially if asInstanceOf were dropped eventually or, better, relied upon TypeTag.

Scala does. It's annoying in pattern matching clauses.

and yet it prevents you from shooting yourself in the foot at run time (if you have `-Xfatal-warnings` enabled), so the compiler has your back.

When Dotty and Java 10 land many annoyances will go away thankfully.

Java maintained compatibility between non-generic containers and generics.

Look at C# on the other hand and you see they had to add entirely new types which means that the .NET framework has an ugly split between APIs based on the old container classes and those based on the new container classes. That fits the trend that C# is a better language than Java, but Java has better class libraries to work with.

But this split happened in 2005. The ecosystem fully migrated to generic containers almost instantly, and you won't find the old containers being used anywhere.

They took a risk, and it paid off. In my opinion, C# is both a better language and has better class libraries.

The be fair, C# was a lot less entrenched at that time than Java was, even though I think Java could have made the same step at the time. But the Java maintainers probably had good reasons for that decision as well.

As for better class libraries, I found the .NET BCL to be excellently designed and thought through. It's also very consistent throughout. Now, parts of the FCL, like System.Windows.Forms are another matter ...

There's still some warts, such as there not being an ISet<T> before .NET 4, but Java's standard library has its share of quirks and historical weirdnesses as well. And as libraries age there are always old ways of doing things you can never really remove, and newer ways that are better. None of the two is as bad as C++, but depending on what you do you can stumble around in a swamp of old APIs for a while before finding what you're actually supposed to use.

> In C# you don't have to pass around the type in a method parameter and use reflection.

I know - but I'm not familiar with the situation where I'd have to to that in java. Do you mean e.g.

    Foo(object x)
      if (X is List<int>)

The canonical example in the standard library would be this one from the entire collections framework:

    public Object[] toArray()
    public <T> T[] toArray(T[])
You cannot turn an ArrayList<T> into a T[], for example, without passing a T[] into the function so that you can grab the type from the passed parameter at runtime. C#, which retains the generic information at runtime doesn't need this so you can just do

    public T[] toArray()
The other place I've seen this come up is with Exceptions- you cannot catch a generic Exception. It's pretty annoying now that Java 8 has streams because you cannot have a checked exception abort out of processing a Stream unless you either 1) catch Exception (rather than the specific subclass) and deal with everything or 2) catch the checked exception in the lambda, wrap it in a runtime exception, rethrow it, and then catch the wrapped exception.

Basically every time you need to instantiate a type passed as a type parameter. You either need to pass the appropriate class literal or a factory function. And if you need ro create an instance of a generic type parameterized by a type parameter (eg. Collection<T>) you need a factory or else resort to raw types and/or ugly casting.

Do you mean things like this (C#)?

    void AddAnItem<T>(List<T> aList) where T : new()
       aList.Add(new T());
Or for instantiating generic types:

    public TColl CreateACollectionAndAddAnItem<TColl, T>(T item) 
       where TColl : ICollection<T>, new()       
       TColl aList = new TColl(); 
       return aList;

    // Usage 
    List<string> myList = CreateACollectionAndAddAnItem<List<string>, string>("hello");

Both those examples are exactly the type of thing you cannot do in Java because of type erasure. For instance, your first example would need to be:

    <T> static void addItem(List<T> list, Class<T> type) {
        T item = type.newInstance();

Can't that be solved by passing these types around under the hood, but allowing the language to sugar them out when they are known? I thought this kind of half-erasure was how some languages already operated on runtimes that erase generics.

That's basically what Java does. Java class definitions contain the full, reified type information, but when the runtime loads the class, those types are lost. Only the compiler uses them.

There are some ways to introspect generic types in Java, but you need a concrete binding. For instance, if a method returns List<Integer>, you can in fact see that it returns List<Integer> and not just List. Method.getGenericReturnType() would return a ParameterizedType with a List raw type and Integer type arguments. But that requirement for it to be a concrete binding means it's not really helpful from the context of writing a generic class or method in the first place.

Using the above example, I'm not sure how you could desugar that without having the type known to the runtime. The generic method is going to be the same code no matter the type provided, but the type provided is necessary in order to know the type to construct. So either the runtime must provide the type, or it must be given as a parameter.

Additionally, glossing over the issue like that creates a huge trade-off. Now programmers must build a mental model of when the compiler can and can't do the type binding. The difficulty of building such a mental model accurately is one of the central complaints against Rust's borrow and lifetime checker(s).

You don’t have to in Kotlin either, and that’s on the JVM.

IIRC Kotlin can only do it on inline functions on type Params explicitly marked reified

> I only thought the erasure was bad because of how Java lacked value types (so my list of T will use heap boxes when T is int).

That's covered by a sibling valhalla feature called 'generic specialisation'. See http://openjdk.java.net/jeps/218.

Yeah that looks like an excellent step (and one that should have been taken around the same time as .NET 2 was released). Getting better performance from a MyIntList than from an ArrayList<int> is such a horrible design smell.

Those are not breaking changes. Existing byte code will continue to work.

My understanding of generic type erasure is that it is more of a language issue than a VM issue. Nothing in the VM prevents reified generics, but when generics were introduced in Java 5 they decided to use type erasure in order to avoid having to produce a whole new collections standard library or break compatibility with existing code.

I meant breaking in that the Java 10 compiler would be targeting a newer version of the JVM and could dispense with having to be backwards compatible, which was one of the reasons I remember reading for generics.

They could have simply left the old library, deprecated it, and added in their genericed one as a a new package. It had to be modified anyway.

Right, it breaks forward compatibility: bytecode for a newer VM won’t work on an older VM. IMO backward compatibility is much more important for a programming language, though.

Or they could have done erasure in the runtime when they detected non-generic usage of a generic class. Instead of forcing it on everyone.

Yup, and Microsoft did exactly that, meaning C# has been able to express List<int> for ten years now.

Nah this is a VM issue.

Type assertions have no ability to recurse. RN you assert both classes are a UTF8 value in the constant pool. Or really 1 class is a UTF8 value, which the compiler know the first one is.

But with Generics, you may not know what it holds at runtime so this doesn't work. You'd have to constantly rebuild the constant pool (Slow) each time a new _kind_ of generic is used.

So either a new type+byte code would have to be introduced, or the type assertion instruction would have to become recursive. The later is a breaking change.

Also interfaces break this b/c those are class specific. So a Lorg/my/company/myClass$5; and Lorg/your/company/yourClass$0 might both implement the same interface. But that interface information isn't in the collection class, but those to class's class files. So really it just needs new byte code.

If you look at the .NET (IL) implementation of generics [1] it relies quite heavily on the VM/Bytecode. Implementing a new collections library (like C#/.NET did) is probably not the cost, afaik the Java team choose type erased generics to prevent breaking binary compatibility between Java 4 and 5 at the VM/bytecode level.

[1]: https://stackoverflow.com/a/5342424/572635

Java could have done the same by implementing erasure as a VM-level construct that's only invoked when a generic class is used in a non-generic manner. That is, using List is equivalent to using List<Object>, which I think even retains semantic equivalence. Bonus points for gating the feature behind pre-5 class versions.

These aren't breaking changes. This isn't even the first time they've added bytecodes. (Java 7 added invokedynamic.)

Isn't type erasure one of the reasons there are so many dynamic languages on the JVM?

Also, why do you consider the introduction of these new bytecodes as breaking changes for the JVM? Backward compatibility has always been one of the highest priorities for the Java architects.

> Isn't type erasure one of the reasons there are so many dynamic languages on the JVM?

No and yes.

No, it doesn't really matter for dynamically typed languages (just use Object for generic types).

Yes, it matters for statically typed languages. One of the factors for Scala .NET abondonment was the difficulty in creating a interoperable advanced type system in a fully reified environment.

IMO, full type erasure (like C/C++) is the right way to go. Java erasure is weird only because it didn't do it completely, not because they did it at all.

What do you mean with "full type erasure (like C/C++)"? Afaik if you have RTTI enabled you have full type information available at runtime, even for templates - which is the thing that people usually want when talking about reified generics. Is it about C++ without RTTI? Or about the fact that vector<int> is compiled completely separate from a vector<string> (monomorphization)? I think the latter one is an orthogonal issue to whether type information is available at runtime.

You're right, since C++98, it's has RTTI. Subpar example on my part :/

> No, it doesn't really matter for dynamically typed languages (just use Object for generic types).

At which point any call to the underlying environment fails with a type error. Sorry your List<Object> is not and will never be a List<int>, trying to pass it as one wont do. Either you provide a way to construct a generic type with the right type parameters in the dynamic language or you accept that you can only inter-op with a limited subset of the underlying environment.

Of course that also gets ugly. You now have to deal with the fact that you not only have a List but several incompatible List<T> floating around, adding the contents of two lists suddenly includes the question what sort of list to return, List<Int>,List<Double>,List<Number> or an error? In a type erased context the answer is always the same, you return a List<Object>.

> No, it doesn't really matter for dynamically typed languages (just use Object for generic types).

This line is disproven by all the dynamic languages that happened and failed on the CLR.

Did they fail because they weren't possible to build on top of the CLR, or because people didn't want to use them?

I meant breaking in that the Java 10 compiler would be targeting a newer version of the JVM and could dispense with having to be backwards compatible, which was one of the reasons I remember reading for generics.

Where are you reading that? The Java N compiler always targets version N of the JVM (well, by default, anyway), and you can't run bytecode on older versions of the JVM.

I would sincerely doubt they're considering breaking back-compat in Java 10 (that is, making it so older bytecode targeting something <10 cannot run on 10+).

scala handles the generic-edge cases using lower-bounds and upper-bound types - and scala runs on JVM. But there could be any other edge cases that cannot be solved easily thru scala's lower-/upper-bound types... (don't know enough)

It's in fact not type erasure that "rears its ugly head", but actually that Java's type system is not expressive enough, which is why developers end up relying on reflective capabilities.

There's something seriously wrong with the type system when you want "instanceOf T" checks or "new T".

There's nothing wrong with doing 'new T' in a generic method or class body. It requires a constraint on the type variable is all: that some constructor with a specific parameter list is available.

If you don't have/want "new T", how do you ever create an instance of anything?

The in this context is a variable, not a concrete type.

Did you mean: "The T in this context..."?

Instantiating a generic type is like a function call at compile time, where the type arguments are substituted for type parameters in the body of the generic type. It's a variable at compile time, but it's a concrete type at runtime for any version of the code that can execute.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact