Hacker News new | comments | show | ask | jobs | submit login
Type erasure and reification (thegreenplace.net)
90 points by matt_d 6 days ago | hide | past | web | favorite | 91 comments





The word 'reification' is a beautiful one that means to make something hidden visible or accessible, or to give concrete form to an abstract thing.

The first time I happened on that word was when reading about Scheme continuations. The simplest way to describe continuations is: reification of return addresses.

In a language like C every function call has a hidden argument: a return closure that consists of a text address and a value to restore the frame pointer to upon return.

Reifying the return address closure requires giving it a name, and that name is simply "continuation" -- it refer to what follows when this function returns by... calling its continuation.

Because returns (now: calls to continuation functions) are always done in the tail position by definition, the caller's stack frame can be reused. This is a key point.

There's only one other mystery needed to fully explain Scheme continuations. Namely, that function frames have to be allocated on the heap in order to get full Scheme semantics. Add delineated continuations and you can have function frame stacks again at the price of a bit more ceremony so that the run-time can know when to call a function on a new stack.


This explanation of continuations obfuscates more than it reveals.

You say that a continuation is a reification of return addresses. But what's reification? Never mind that, what's a return address? It's a return closure that consists of a text address? You mean, text, like, a string, an array of bytes? Oh, uh, no.

And, if you're really trying to explain this to a C programmer: what's a closure? Who are you imagining talking to that already understands what a closure is but doesn't know what a continuation is? Closures are at least as confusing as continuations; arguably more so.

What's "the tail position"? What does it mean to reuse a stack frame? ("This is a key point.") Are they different from function frames?

Who does this explanation serve? Certainly not a JS programmer trying to learn for the first time how to use function* and yield. Not a C programmer.

I think this explanation is for a Scheme programmer playing Ockham's razor to define things in terms of the smallest possible number of other things. This behavior can entertaining, but it's not educational, not explanatory.

You can define arithmetic in terms of set theory, but that's not how we teach arithmetic to children. Similarly, we shouldn't explain continuations in terms of implicit return closures.

Try this: Continuations allow you to pause and resume functions. The paused functions fold up into "continuation" values, which you use to resume the function where it left off.


I disagree with your statement that grandparent's explanation was not educational.

The explanation helped me, a primarily C programmer who has done some work in higher level languages. When I read "reification of return addresses" that helped the concept click in a way it has not before.


I disagree. Parent's comment was crystal clear and actually had me thinking; "I gotta start using that." I've been struggling with explaining continuations for a long time. Passing a return address is a great explanation.

It's more than a return address though, since those are stateless. It's more like a return context since it can encapsulate state.

There is a little nugget here that I've always found illuminating: the mention of the text address and frame pointer, sometimes called the ip-ep pair, short for instruction pointer and the environment pointer.

The environment that a function executes in is important if, as in most interesting languages, there can be any free variables within the body of the function. What is the environment of a running function? Usually, its a collection of variable-name binding frames that determine the value of a programs variables at any point in time. The ep stands for the pointer into the data-structure (e.g. stack and its back links) that determine how variables are attached to values at a particular moment.

Lots of languages use stacks to hold these frames. In dynamic scoping languages, for example some Lisps like Emacs Lisp, a free variable's value is found in the stack frames determined by the run-time order of calls. Most modern languages use static scoping and the frames that determine a function's free variables values are based on the static nesting of functions or modules as they appear in the source code. In either case, a stack is usually where these frames that bind names to values are found. Dynamic vs static scope just changes which frames are searched for the binding of a variable at runtime (optimizations make this faster enough to be practical).

The text address is the reference to the next instruction (symbolic, virtual, or machine level) to be executed in a text segment (or code segment), the part of an object file that holds executable instructions ; I think of it as the IP of assembly language.

These two elements form a pair that in a simplified view represent a program in execution at a specific moment in time. So, this (ip, ep) pair is like a frozen program that can be brought back to life by simply loading two values into the basic execution machinery, the instruction pointer and the environment pointer. They need to be stored somewhere when a function is invoked and reloaded so that the caller is resumed upon return of the called function. When narrowly interpreted as parts of a call stack it's easy to lose sight of the power of the (ip, ep) pair. Part of Scheme's beauty was the expicit embrace of continuations as first class objects that can be flexibly manipulated by a running program and reified to put the program into a previously frozen state and awakened to continue there.

The Borrough's 5700 was designed to support a specific model of computation (Algol 60) and it's hardware description in [1] has pretty diagrams of the interaction of the ip,ep pair and the hardware (although the book is quite dated now).

[1] E. Organick, Computer System Organization: The B5700/B6700 Series, 1973, https://www.amazon.com/Computer-System-Organization-B5700-B6...


The name reification comes from the latin res, thing, so it's the processus that transform something into a thing.

For a language runtime, it's the way to transform a language artifact to a runtime object.

By example, in Java, apart the reification of the type argument, the transformation of a lambda to a proxy that implement the functional interface is also a reification.


Thanks for the latin!

I'd only heard reification in the context of data modelling, where it's often meant to model the meta-model.

For example, RDF reification. RFD involves statements in the triple form (subject, predicate, object). The triple (Mary, enrolled, UCSF) corresponds to the statement that "Mary is enrolled at UCSF". A reification might be to turn that statement into a subject, and to make a statement about the statement. Eg. ((Mary, enrolled, USCF), created, 2018-12-06) - the statement that Mary is enrolled at USCF was created on 2018-12-06.


I was in a hurry. I forgot to add where the reification comes in! Scheme's call/cc is just taking the current function frame's return address closure and passing it as a function argument to the function given to call/cc -- this is the reification of the current function frame's return address closure.

Nice, clear summary.

I'd like to throw some more thoughts out there, though. Focusing overmuch on what Java did leaves the picture somewhat incomplete.

First off, I've seen some people suggest that type erasure is a problem because it leads to a loss of type safety. This isn't really true. Haskell, for example, also preserves no run-time type information. The difference there is really about strength of typing - Haskell is very strongly typed, whereas C is weakly typed, and Java falls somewhere in the middle.

Second, the Java convention of passing Class variables into generic methods in order to give them all the type information they need:

  <T> T getT(Class[t] clazz) { ... }
looks a bit like the C/C++ type erasure pattern that the author mentioned in passing at the start of the article, if you squint at it.

My own sense is that the reason why type erasure was a problematic choice in Java is that it places Java in this uncomfortable middle ground where you have neither the robust compile-time type checking of a strong statically typed language like Haskell, nor the robust run-time reflection of a modern dynamic language. The end result, being robust at nothing, did not turn out to be the best of both worlds.


Regardless of questions of what should or shouldn't be done, there is a serious issue with reifying generic types with type parameters that support subtyping, as the JVM does. It requires deciding on a variance policy (to determine the subtyping relationships among the generic types), and different languages choose different variance policies to fit with their overall design. Reifying generic types in this case would bake a single variance policy into the runtime, which would hinder the ability of different languages to share data structures. If the type parameters do not support subtyping, this is not a problem, and, indeed, when generic types will be instantiated with parameters of Java's upcoming value types (that cannot be subtyped), they will likely be reified and specialized.

> It requires deciding on a variance policy (to determine the subtyping relationships among the generic types)

Why does it? In C#, for example, variance policy for generic type parameters is expressed explicitly - invariant by default, covariant or contravariant if you ask for it. While C# also has reified generics, I don't see why this couldn't be done with non-reified ones.


C#'s reified generics are actually a problem for other languages that want to run on the CLR, and in fact those languages generally have data structures with no reified generics (the data structures just store type Object and do perform dynamic casting). In order to work with the CLR bytecode generated outside of their compilers, these languages unfortunately also have to provide providence for understanding generic data structures.

The JVM has been more successful than the CLR at hosting other languages, and having unreified generics I believe has contributed to that success. For instance, I have seen both the creator of Scala and one of the JRuby contributors both say they prefer JVM generics to CLR generics.


Reification doesn't just impose a substantial complexity cost on dynamic languages in the CLR, either. See this F# feature request from 2011 that is effectively unimplementable:

https://visualstudio.uservoice.com/forums/121579-visual-stud...

As GP mentions, Haskell is a great example of achieving both superb type safety and superb type expression with type erasure. That said, sometimes you don't want superb type safety - things like dependency injection and mocks are examples of frameworks you can use in languages with some amount of type reification without restructuring your entire codebase. Java/JVM achieve a decent balance here imo.


I still don't see what that has to do with variance policy specifically. You're basically saying that reified generics are bad, because they require all languages on a given runtime to be aware of them to interoperate. Which is a valid point, but if you are aware of generics, then CLR lets you express whatever variance you want - including none. And so long as your language has CLR interface casts at all, you automatically get support for that variance at runtime, even if the type system of the language doesn't reflect such relations statically; so e.g. you might not be able to implicitly convert IEnumerable<Derived> to IEnumerable<Base>, but you should always be able to explicitly cast IEnumerable<Derived> -> Object -> IEnumerable<Base>.

The issue is not the variance of a specific data structure, but of how variance itself is expressed, which differs from one language to another. What you described is one variance policy.

Can you give an example of a language variance policy that is 1) sound, and 2) cannot be mapped onto CLR variance declarations?

I'll grant you that it's possible to come up with policies that are not sound (e.g. covariant mutable collections, like both Java and .NET do with built-in arrays) that cannot be so mapped. But how many languages have run into this as an issue in practice?

By far the most common policy, in any case, is "no variance", which maps just fine. And you don't need to support variance to consume .NET interfaces that do have variance declarations, for the reasons I have explained in my earlier comment - so it doesn't really affect interop.


Sure. Java uses use-site variance. Clojure has an "all ways" variance, as it's untyped. But Java, Kotlin (mostly declaration-site, some use-site) and Clojure all share the same data types without any need for runtime conversions. In general, you want to bake as little of the type-system as possible into the runtime; just as much as necessary to provide your most essential features. Reifying generic types with subtypable indices is a feature with very little benefit and a pretty big cost in lost flexibility.

You can do use-site declarations for upper boundaries in C# as well with "where". There's no equivalent to "super", though.

The benefit of having this on VM level is the same as having classes on VM level - type safety rules are consistent and uniform, and you can't write a verified (in IL terms) component in one language that can be suddenly made unsafe by another language. Also, I don't see how you'd implement variance for is/as (instanceof and casts in Java) without runtime support; JVM languages don't have this issue only because there's no way to query an object for implementation of a particular instantiation of a generic interface on JVM in general, due to type erasure. But in C#, you can e.g. do:

   var xs = new List<Derived>();
   var ys = xs as IEnumerable<Base>();
and it'll work, even if these two lines are in totally different components. How would you do this in a typesafe manner (i.e. "as" only works if Derived is actually inherited from Base) on JVM?

> How would you do this in a typesafe manner (i.e. "as" only works if Derived is actually inherited from Base) on JVM?

On the JVM or in Java? In Java you don't do it in such a safe manner. Whether you should or shouldn't be able to is another matter. If you want to write a JVM language where you want to do that safely (like Ceylon), you just reify your type -- at the cost of interop and reuse. Opting into reification is cheaper (at runtime) than opting out.


Let me rephrase that: how do you do it across languages?

And why shouldn't you be able do it? Instantiations of generic interfaces are distinct types - the compile-time type system says so, and of course their members have different types too, so it can't possibly be a single type. So why can't you implement two different interface types on one class, just because they happen to be instantiations of the same generic? No matter how you slice it, it's an implementation restriction leaking out in a way that is inconsistent with the type system.


> how do you do it across languages?

You don't. Haskell doesn't do it for its C calls.

> And why shouldn't you be able do it?

You are able to do it, but it comes at the cost of restricting your languages to the design of the runtime. It ends up being a question of which you prefer. Different languages that can freely share code and data, or languages that are either very restricted or require a lot of (runtime) effort to share code and data.

> so it can't possibly be a single type

List<String> and List<Integer> may be two different types in Java, but they're the same type in Clojure. Erasing generic types helps you enjoy both the type safety of Java and the flexibility of Clojure, at the tiny cost of some unsafe-ish reflective operations, out of many that are unsafeish anyway.

> No matter how you slice it, it's an implementation restriction leaking out in a way that is inconsistent with the type system.

It's not a restriction because a JVM language could choose to reify its generics, at the cost of foregoing efficient interop. It just so happens that most JVM languages choose not to do it that way.


> First off, I've seen some people suggest that type erasure is a problem because it leads to a loss of type safety. This isn't really true.

I'd argue it's the opposite.

Giving your programs the ability to access reified types at runtime weaken the robustness of your code, because you can second guess the compiler and introduce crashing code, such as incorrect type casts.

Reification also comes at a cost: performance, more problematic interoperability (Scala could notoriously not be ported to .net because it was not possible to make its type system work properly on a reified platform).

Contrary to popular belief, Java's choice of erasure was not dictated by backward compatibility concerns but because erasure is a sound approach for type system representation.


> Contrary to popular belief, Java's choice of erasure was not dictated by backward compatibility concerns but because erasure is a sound approach for type system representation.

Do you have evidence for the "because"? I agree that erasure is a sound approach for type system representation when designing a language, but I thought that backwards compatibility was explicitly a concern in the case of Java.

Also, that good languages can be built with type erasure doesn't mean it's a good fit for the Java context where the runtime is doing a lot of dynamic dispatch based on tags that now only represent a part of the type.


> Do you have evidence for the "because"? I agree that erasure is a sound approach for type system representation when designing a language, but I thought that backwards compatibility was explicitly a concern in the case of Java.

When the debate was raging around 2004 when generics were being discussed, Neal Gafter offered a proposal of a reified JVM that would maintain backward compatibility.

Erasure was ultimately chosen over that proposal.

Backward compatibility was not the issue: technical soundness was.


> Contrary to popular belief, Java's choice of erasure was not dictated by backward compatibility concerns but because erasure is a sound approach for type system representation.

This is false. I cannot find the source at the moment, but in a talk about implementing improved lambda support in Java 8 a Java compiler dev clearly labeled type erasure as technical debt they have to work around.

It frequently bites them when they try to implement new things, and causes weird interaction with new language features they have to work around.


In other talks Java language devs mention it as a conscious design choice.

Reading JSR-14 (https://download.oracle.com/otndocs/jcp/7566-generics-0.1-pr...), it was a conscious choice:

”Supporting generic types at run time seems undesirable for the following reasons:

- Lack of experience with such constructs in widely used languages

- Burden of extensive VM changes on vendors throughout the industry

- Increased footprint on small devices

- Decreased performance for generic methods

- Compatibility”

However, that doesn’t rule out it is technical debt, too. “Conscious design choice” and “technical debt” do not rule out each other.


well to be fair, with this debt they came up with something really useful. indy and lambdametafactory

Scala could be ported .net easily enough, but it couldn’t interop with reified generic APIs. MS actually doesn’t use generics as aggressively as it could in its API for similar reasons (eg there are two APIs for collections that use genetics and don’t).

My understanding is that the problem was, more specifically, that they couldn't easily get Scala's types mated to .NET, and that was in turn an implication of the decision to use IKVM to run the Java binaries for Scala's standard libraries on the CLR.

Which I could see being a problem, because then you need to figure out how to smoothly marshal data back and forth between (type erased) Scala collections and .NET functions, which typically expect reified types these days.

If they had gone with producing a .NET native version of Scala's standard libraries, they might have had easier success. But it also would have been a Pyrrhic victory, because you'd have lost the ability to run any existing Scala code, which would have scads of dependencies on other bits of the JDK as well, on Scala for .NET. At which point, what's the point?

I expect that porting F# to the JVM would encounter similar problems. And I expect that which platform you want to blame them on depends on which one is your home platform.

IMO the real story here is that the two big managed platforms are effectively walled gardens, by virtue of having VMs that implement such very high-level bytecode. Both of them effectively demand that all languages running on them can interact with, in effect, a thoroughly object-oriented ABI. There being approximately as may flavors of OOP as there are OO languages, that creates a situation where, if your language isn't set up to play nice with that platform's particular brand of OOP, you're going to have a bad time.


I agree with most of what you said, but porting an erased language to the JVM is not very hard. Porting a reified generic language to the JVM would be challenging because genetics would have to be reified perhaps inefficiently, while clean interop would be impossible.

The JVM was never meant for other languages, that just happened, and most of the non-Java languages were designed to live nicely in the JVM world. The CLR is a bit more ambiguous, but I think it’s safe to say at this point that it is even less suited to other languages not designed specifically for it.


I think it's safe to say that was the case before .NET 4 introduced the dynamic language runtime. Afterwards, I'm not so sure - it seems like the story there was that the .NET team put a whole lot of work into building support for IronRuby and IronPython, but the languages themselves failed to take off for other reasons. Lack of community need/interest, for one.

To me, the more interesting test is, could you do a useful port of a language like C or Pascal or ML to the JVM or the CLR, while having it remain the same language. My instinct is that it's equally unworkable for both platforms. Requiring erasure vs reification is a small difference compared to the major demand that both platforms make, which is that any languages targeting them need to be at least partially object-oriented.


> To me, the more interesting test is, could you do a useful port of a language like C or Pascal or ML to the JVM or the CLR, while having it remain the same language.

Yes, you could. In fact, for C, this is experimentally proven - if you compile your C or C++ code with MSVC using /clr:pure, you'll get pure IL as output. This is by design - CLR is powerful and flexible enough to map any C construct, including raw pointers, unions etc, precisely because compiling C++ to it was always a requirement. Of course, the result isn't verifiably safe - but unlike JVM, CLR doesn't require verifiable code to be able to run it!

Consequently, if you can compile something to C, you can compile it to IL, and run it on CLR. In fact, there are even things that you can't compile to C that you can do in IL - for example, it has a tailcall opcode, which performs guaranteed tailcalls. So, yes, you would be able to compile Pascal or ML to it.

The reason why languages still adapt to CLR is because they don't just want to use it as a VM - they want to interop with other code that runs on it. In theory, it would be possible for two languages both hosted on CLR to use the equivalent of the C ABI to interoperate, but in practice nobody does that, and of course the .NET standard library is nothing like that. So if your language wants to be able to use .NET libraries, it has to support the .NET object model - and it is at that point that you get language dialects like C++/CLI.


The DLR was only going to ever work out for dynamic languages, and even then I think it was pretty limited to class-based ones like Python and Ruby.

Unfortunately, the DLR seems to be pretty much deprecated at this point. Its a shame, I was able to do great things with the run-time generation API (which now defaults to an interpreter in UWP).


> Its a shame, I was able to do great things with the run-time generation API (which now defaults to an interpreter in UWP).

That's more to do with UWP being a AOT platform than it being deprecated; it should still work fine with .NET Core?


They are rebooting the .NET AOT story after the 3.0 release.

It was discussed during this week Connect(), on the ".NET today and tomorrow" Q&A session.

https://channel9.msdn.com/Events/Connect/Microsoft-Connect--...

The video is not yet available though.


I have no idea. I’ve mostly been interested in UI work anyways, and not being able to move to the new UI stack really sucked for me. Now it’s all just a bunch of bad choices.

The DLR goes beyond running dynamically typed languages on CLR, though. For example, there's the Python.NET library, which lets you embed regular Python (not IronPython!) into a .NET app - and it still uses "dynamic" etc.

> But it also would have been a Pyrrhic victory, because you'd have lost the ability to run any existing Scala code, which would have scads of dependencies on other bits of the JDK as well, on Scala for .NET. At which point, what's the point?

Wouldn't it be in the same boat as scala-js? There are quite a few libraries that support scala-jvm and scala-js.


> But it also would have been a Pyrrhic victory, because you'd have lost the ability to run any existing Scala code, which would have scads of dependencies on other bits of the JDK as well, on Scala for .NET. At which point, what's the point?

Ask the parade of JVM languages porting themselves to native and JS runtimes.


Those are both very different use cases.

Why are they very different? They port the language over to another platform and need to interop with it.

For example scala.js can call js libraries through facades (similar to typescript typing files) but can also be called from javascript.

Scala.native can call c libraries (I have no idea about the other way around).


Targeting native because it's fundamentally a less thorny issue. To an approximation, all you do is take the JIT compilation pipeline and use it to do an AOT compilation instead.

Targeting JavaScript because people expect different things out of it. The goal there is to transpile some portion of your code to run on the browser. There's not the same expectation that you'll be able to truck Spring or Akka along with you. The expectation is that the transpiled code is mostly only going to interact with the DOM and other JavaScript libraries, plus maybe whatever subset of your business objects you feel the need to have on both the client and server side.

By contrast, the only real use case I can see for porting Scala to .NET is so that you can bring a bunch of pre-existing Scala libraries with you and then interface them with some pre-existing .NET codebase. If you're not looking for that, you'd probably be willing to satisfy yourself with either F# or the JVM.


> MS actually doesn’t use generics as aggressively as it could in its API for similar reasons (eg there are two APIs for collections that use genetics and don’t).

All modern .NET APIs use generics heavily, and have done so since at least C# 3.5 (e.g. LINQ is pretty much all generics).

The reason why there are two collection APIs is because one of them predates generics. It is generally considered deprecated and rarely used, with exception of untyped IEnumerable, and that mostly because it's the base interface of IEnumerable<T>. It's not dissimilar to how Java also has legacy collections from 1.0 days.


The non-generic super-interface pattern isn’t completely deprecated. It’s still advocated as a workaround for the lack of existential types in the language.

> MS actually doesn’t use generics as aggressively as it could in its API for similar reasons (eg there are two APIs for collections that use genetics and don’t).

Object based collections are generally deprecated over the generic versions and were existed prior to generics being introduced.


You’ll see that they still get a lot of use in the interfaces of non deprecated APIs (eg WPF).

WPF predates generics in.NET or at least ost of the WPF APIs were stable and finished before generics were there and retrofitting generics wasn't done for some reason or another.

to be fair, they tried to kill of winforms and wpf. I mean they invented uwp and I doubt that they did not wanted to maintain three of these gui frameworks.

UWP aka WinRT, was a .NET takeover from the WinDev team after they regained internal power with Vista replacing Longhorn and getting COM versions of the .NET designs.

If you compare WinRT with the the original design from Ext-VOS (COM+ Runtime) before it became the CLR, you will find quite a few similarities.


Agreed. But I don't think UWP uses generics very much (if at all) either.

To be honest, it feels like Microsoft's UI strategy these days is either UWP for the store or just use the web.


Can use the Desktop Bridge https://docs.microsoft.com/en-us/windows/uwp/porting/desktop... with .NET Core 3.0 and the newly open sourced UI frameworks to get full Jitting and Store access (UWP is microsoft-ui-xaml https://github.com/Microsoft/microsoft-ui-xaml)

it's ot, but still...

> Microsoft's UI strategy

at the moment I really think that they do not have a strategy. I mean they even have Xamarin.Forms which is another XAML variant.

which ups their UI stuff to 4. I do not see any of them being deprecated at all.


UWP is just improved COM, so generics cannot be easily done as general purpose language ABI.

WinRT does generics by synthesizing interface specializations (so e.g. their GUID is effectively a hash of the original interface name + type parameters). Since the mapping is exposed as a public function, generics are a part of the ABI, and are mapped accordingly in language projections.

https://docs.microsoft.com/en-us/windows/desktop/api/roparam...


Yep, I should have been more explicit, my point by "cannot be easily done" was the amount of stuff going behind the scenes versus what happens in language runtimes.

Thanks for clarifying it.


> Reification also comes at a cost: performance

On the contrary, reified generics give the implementation more leeway with representation choices which can help with performance. In Java for instance a `List<Integer>` is stored as an array of boxed Integer objects, without easily being able to store the values inline.


That can be done even more easily with properly erased generics.

This is another of those unfortunate implications of Java's having disappeared into this ugly chasm between static and dynamic typing. It doesn't fully erase types, so you can't do like C++ or Haskell do and just generate a completely different type, because it wouldn't be compatible with any functions that work with the generic type. At least, not without breaking backward compatibility, which was the whole point of doing type erased generics in the first place.


> This is another of those unfortunate implications of Java's having disappeared into this ugly chasm between static and dynamic typing. It doesn't fully erase types, so you can't do like C++ or Haskell do and just generate a completely different type, because it wouldn't be compatible with any functions that work with the generic type.

Full monomorphisation is not compatible with separate compilation though.


I'm curious, what is not "proper" about Java's erased generics?

Reified types also bring other advantages, e.g. runtime reflection can be useful and can enable meta-programming and allows check binary compatibility of dynamic linking.

So gives that you want a language with reified types, losing that reification with generics creates awkward cases in the languages and prevents possibility for optimization.


They're not erased all the way, they're only erased down into the root type for all objects. So you've still got some type information left (namely, that it's an Object), and any types that are not Objects are officially banned from the generics system.

Hence the need to dress them up as washerwomen and sneak them in through the service entrance.


Haskell has erased generics and unboxing for days.

> ... for days.

Is that an estimate of Haskell compilation time?


My professional experience has been that Haskell compile times are great tyvm :)

"Days" is obviously hyperbole, but I typed that while waiting for a Haskell build and I'm still waiting on that same build. Granted that I'm rebuilding a whole lot more of the world than typical, but build times are definitely one of my few complaints about working in Haskell (though it's been improving).

> Contrary to popular belief, Java's choice of erasure was not dictated by backward compatibility concerns but because erasure is a sound approach for type system representation.

I wouldn't call it a "sound approach for type system representation", when it can't even consistently represent the fact that generic interfaces instantiated with different type parameters are different interfaces. I mean, it's obviously true given that any parameter-dependent method signatures are different, yet you can't implement Foo<Bar> and Foo<Baz> on the same class.

> Reification also comes at a cost: performance

What's the performance cost of reification? When comparing C# to Java, the only thing I can think of is that C# has an advantage in performance, because, due to generics being reified, its JIT can compile instantiations separately.


> you can't implement Foo<Bar> and Foo<Baz> on the same class.

Works for me:

    import java.util.List;

    class Test {
      public void setListBoolean(List<Boolean> list) {}
      public void setListString(List<String> list) {}

      public List<Boolean> getListBoolean() { return null; }
      public List<String> getListString() { return null; }

      public static void main(String[] args) {}
    }

You're using it, not implementing it. Try:

   class Test implements List<Boolean>, List<String> { ... }
For another example, try your snippet, but rename setListBoolean and setListString to just setList (so that they're overloads).

OK I see what you mean now, but that is the combination of type erasure and allowing method overloading that is causing the issue, not the erasure itself. For example, look at a language that doesn't allow overloads, like OCaml. Its types are fully erased, allowing very efficient compiled or transpiled code, e.g. https://reasonml.github.io/en/try?rrjsx=false&reason=LYewJgr...

Overloading is orthogonal to all this. The reason why it can't implement two generic interface instantiations on the same class is not because method names clash (even sans overloading, the language can provide means to implement interface methods with differently-named class methods - e.g. C# does that). It's because at runtime, once you erase the type parameter, there's only one interface type, and you cannot implement it twice.

The reason why OCaml doesn't have this problem is because it doesn't have runtime type checks of the kind Java does - there's no "instanceof", and there's no downcasts or cross-casts. So it doesn't need reified types, because it doesn't allow the programmer to perform operations that expose the difference. So, for OCaml, type erasure is a sound approach that is consistent with its type system. But not for Java, because of casts and "instanceof".


Your functions are named differently, though.

> Contrary to popular belief, Java's choice of erasure was not dictated by backward compatibility concerns but because erasure is a sound approach for type system representation...

...for languages that don't have casting and have correct variance. Java fails on both.


> Contrary to popular belief, Java's choice of erasure was not dictated by backward compatibility concerns but because erasure is a sound approach for type system representation.

That statement is obviously wrong since Java did not choose type erasure.

https://docs.oracle.com/javase/8/docs/api/java/lang/Object.h...

In Java 5, generic types were added. And those Java chose to erase.

So Java chose both: to reify and erase.


Haskell implementations typically pass around dictionaries at runtime when you use type classes. Since there is one per instance, this could be thought of as a form of type reification.

https://www.schoolofhaskell.com/user/jfischoff/instances-and...


I don't agree that this is type reification. The dictionaries are resolved at compile time. They may indeed be passed to a function at run-time, but they are not exposed to the user. They are simply a compile-time substitution of abstract function invocations with the implementations according to the chosen type. For actual reification of types in Haskell, you can look at Haskell's Data.Typeable library.

Well, it's an internal data structure needed at runtime that corresponds to a type, which seems close enough for a rough analogy. But I agree that it's not a language feature.

It depends on whether you're looking at it from the implementation perspective, or from language user perspective. To the implementation, a dictionary carrying type information (or anything derived from that, such as e.g. method dispatch tables) at runtime is type reification.

The concept of reification exists only in the context of a chosen language and its universe of objects, and it doesn't extend to the language's implementation. Given the language Haskell and its universe of objects, Data.Typeable provides a reification of Haskell's types into physical objects inside that universe. The type class dictionaries are not physical objects inside this universe.

By that argument, a Java object's methods are also resolved at compile time.

I didn't structure my reply very well. I didn't mention the fact that the dictionaries are resolved at compile time as a contradiction to anything.

Does anyone know of a good blog post or article that explains why Java's type erasure approach to generics is considered problematic? I'd be interested to read about the specific problems it causes.

It's a bunch of small things, none of which are showstopping, all of which are annoying.

You can't have these two functions co-existing in the same class:

  void Frob(List<String> someList) { ... }
  void Frob(List<Integer> someList) { ... }
because, after erasure, they have identical signatures.

You've got to repeat yourself a bunch, because generic methods don't know their own type parameters. So, to take an example using the Jackson JSON-handling library:

  MyClass foo = objectMapper.readValue(json, MyClass.class);
is a bit annoying if you're coming from a language like C#, where you'd expect to be able to write code more like:

  MyClass foo = objectMapper.readValue(json);

Generic type instantiations aren't really types in Java, they're more this sort of type-ish concept. List<String> and List<Integer> exist as distinctions made by Java's type checker, but their actual class is just List - there's no such thing as List<String>.class. That's what's going on with that method overload example above. It's also the reason that this won't even compile, and you've got to resort to hacky workarounds in this sort of situation:

  Node<MyClass> foo = objectMapper.readValue(json, Node<MyClass>.class);

Then there's all the stuff about boxing, and how you can't have generic collections of atomic values, only boxed values. Which is responsible for a lot of excess memory consumption and pointer chasing, and also for things like that whole extra side quest in Java 8 streams where you get classes like IntStream and DoubleStream (but not FloatStream).

Besides those you also can't do something like

    Object xyz = ...;
    if (xyz instanceof List<String>) { doA(); }
    else if (xyz instanceof List<Integer> { doB(); }
because there is only a single List type at runtime.

As long as you are able to take the same code path for all empty lists, you can do something like

  if (xyz instanceof List) {
    if (xyz.isEmpty()) { ... }
    else if (xyz.get(0) instanceof String) { ... }
    else if (xyz.get(0) instanceof Integer) { ... }
  }
Which, I admit, is a gross hack.

And, conversely, you can't implement both List<String> and List<Integer> on the same class.

Though, even in C#, it would be inadvisable. If you have the same class implement both `IList<String>` and `IList<int>` you'll have some ambiguity to deal with in implementing

  object IList.this[int index] { ... }
The expectation is that that property's getter should just upcast the return value of the generic version of the indexer.

For both languages, you'd be better off having two separate methods for the two different kinds of list.


The point of doing this is not to invoke these members directly on the object, though. It's to allow the object to be passed to anything that expects either IList<string> or IList<int>, or for runtime discovery purposes with is/as.

Perhaps a better example is something like IConvertibleTo<T>. There's an obvious meaning to a class implementing it for many different Ts.


There’s actually another category: type reification WITH type erasure! https://github.com/mrakgr/The-Spiral-Language/blob/master/re... (search for default_of)

Well, kinda cheating since the language’s about inlining these away. But technically true...


Here is an interesting blog post by Gilad Bracha on reified generics:

https://gbracha.blogspot.com/2018/10/reified-generics-search...


META

I really like this guy's articles. If you look in the archives [1], he's been writing prolifically for at least 15 years.

Let's zoom on 2018: [2] * 3 summaries of reading for at least 40 books read in a number of miscellaneous subjects (some of them re-reads) * Articles on type theory * Some articles centered around specific programming languages like python, Haskell, and Go * Articles on concurrency * And much more! :-)

I think he has "seasons" of learning particular languages. I think I've read some of his articles on Clojure, but he stopped writing about it eventually. According to StackOverflow [3], he appears to be an expert in Python.

In his Github account [4], I see a bunch of repos including one for code on his blog, but I don't think there's any particular repo he focuses on (maybe pycparser?).

Apart from making me feel inadequate about my own year accomplishments (how many books have I read this year?), I'm curious what one would do after such sizable knowledge ingestion. I'm surprised this guy doesn't attempt to come up with his own programming language. Maybe he knows is not worth it? Maybe he knows that is better to know the nooks and crannies of existing languages than trying to come up with a new one?

Maybe he just reads books instead of going dancing, watching a movie, socializing in a bar, listening music, browsing the web, playing the piano, or what not. Maybe he does all this things and still has time for reading and learning and writing his blog.

Sometimes I wish/hope/expect that everything I've learnt will sometime in the future allow me to create something, anything, where I can mix and match pieces of everything I've taken in. For sure this guy already benefits from all his knowledge on his day job, his side projects and maybe on his everyday life and interactions. I don't know, but I'm curious.

I mostly punish myself, all the time, for not being more productive, all the time.

1: https://eli.thegreenplace.net/archives/all

2: https://eli.thegreenplace.net/archives/2018

3: https://stackoverflow.com/users/8206/eli-bendersky?tab=profi...

4: https://github.com/eliben?utf8=%E2%9C%93&tab=repositories&q=...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: