I felt the same slight sadness when I saw the complexity of the planning involved in Java value types, and the many-year path to get there. Intuitively, a sufficiently smart compiler should have been able to take care of this, and in some cases it does do. So it's worth reflecting on why adding new opcodes and such is necessary.
HotSpot and some other JVMs can do an optimisation called "scalar replacement", which converts objects into collections of local variables which are then subject to further optimisation. So for example a Triple<A, B, C> type class could be converted into three variables, then the optimiser notices that the second element of the triple was never actually used anywhere, and eliminates it entirely.
Scalar replacement is one of several optimisations that relies on the output of an escape analysis. Escape analysis is a better known term so people often use the name of the analysis to mean the name of the optimisations it unlocks, although that's not quite precise. Sometimes people talk about stack allocation, but that isn't quite right either. Only objects that don't "escape" [the thread] can be scalar replaced.
There are several reasons this isn't enough and why Java needs explicit support for value types.
Firstly, the JVM implements two different EA algorithms. The production algorithm that is used with the out of the box JIT compiler (C2) is somewhat limited. It can identify an object as escaping just because it might escape in some situations, even if it often doesn't. There is a better algorithm called "partial escape analysis" implemented in the experimental Graal JIT compiler, but Graal isn't used by default. In Java 8 it requires a special VM download. It'll be usable in Java 9 via command line switches. PEA can unlock optimisation of objects along specific code paths within a method even if in others it would escape, because it enables the un-optimisation of escaped objects back onto the heap.
Graal unfortunately can't be just dropped in. For one, you don't just replace the entire JIT compiler for a production grade mission critical VM like HotSpot. It may take years to shake out all the odd edge case bugs revealed by usage on the full range of software. For another, Graal is itself written in Java and thus suffers warmup periods where it compiles itself. The ahead-of-time compilation work done in Java 9 is - I believe - partly intended to pave a path to production for Graal.
Implementing PEA in C2 is theoretically possible, but I get the sense that there isn't much major work being done on C2 at the moment beyond some neat auto-vectorisation work being contributed by other parties. All the effort is going into Graal. I've heard that this is partly because it's a very complex bit of C++ and they're afraid of destabilising the compiler if they make major changes to it.
Unfortunately even with PEA, that still isn't enough to replace value types.
(P)EA can only work on code that is in-memory when the compilers optimisation passes are operating, and moreover, only in-memory in the form of compiler intermediate representation. In a per-method compilation environment like most Java JITCs that means it can only work if the compiler inlines enough code into the method it's compiling. Graal inlines more aggressively than C2 does but it still has the fundamental limitation that it can't do inter-procedural escape analysis.
It can be asked, why not? What's so hard about inter-procedural EA?
That's a really good question that I wish John Rose, Brian Goetz and the others had written a solid full design doc or presentation on somewhere. They've said it would be incredibly painful, and obviously view the Valhalla (also incredibly painful) path as superior, so it must be hard. In the absence of such a talk I'll try and summarise what I learned by reading and watching what's out there.
Firstly, auto-valueization - which is what we're talking about here - would necessitate much, much larger changes to the JVM. Methods and classes are the atoms of a JVM and changing anything about their definition has a ripple effect over millions of lines of C++. HotSpot isn't just any C++ though - it's one of the most massively complex pieces of systems software out there, with big chunks written in assembly (or C++ that generates assembly). The Java architects have sometimes made references to the huge cost of changing the VM. They clearly perceive changing the Java language as expensive too, but compared to the cost of changing HotSpot, it can still be cheaper to complicate the language. Frankly it sounds like Java is groaning under the weight of HotSpot - it's fantastically stable and highly optimised, but that came at large complexity and testing costs that have to be considered for every feature. Part of the justification for doing Graal is that the JITC is the biggest part of the JVM and by rewriting it in Java, they win back some agility.
As an example of why this gets hard fast, consider that the compiled form of a "valueized" version of a parameter is different to the reference form. Same for return value (a value can be packed into multiple registers). So you either have to pick a form and then box/unbox if the compiled version doesn't 'fit' into the call site, or compile multiple forms and then keep track of them inside the VM to ensure they're linked correctly. And a class that embeds an array may wish to be specialised to a value-type array, but then you need to detect that and auto-specialise, detecting any codepaths that might assume reference semantics like nullability or synchronisation, and then you have to be able to undo all that work if you load a new class that violates that constraint.
Secondly, detecting if something can be converted into a value type (probably) requires a whole program analysis. Java is very badly suited to this kind of analysis because it's designed on the assumption of very dynamic, very lazy loading of code. Not just because of applets and other things that download code on the fly, but it's quite common in Java-land for libraries to write and load bytecode at runtime too. Also Java users and developers expect to be able to download some random JAR and run it, or a random pile of JARs and run it, with essentially no startup delay. HotSpot can run hello world in 80 milliseconds on my laptop and the very fast edit-run cycle is one of the things that makes Java development productive relative to C++.
All that said, applets are less important than they once were, and Java 9 does introduce a "jlink" tool that does some kinds of link-time optimisations:
Java has two fundamentally different kinds of types: objects and primitives. Value types are user-defined primitives. Seems pretty straightforward, from a language design perspective. I'm not sure why we're trying so hard to avoid this.
You'd need to generalize stacks to full-fledged region analysis to see any real benefits, but only MLKit has taken it this far IIRC.
Nothing's impossible I guess, but this kind of global datastructure reshaping is definitely far beyond the state of the art in Java JIT compilers.
My mistake--I thought "regions" meant some sort of interprocedural "code regions".
I suspect the state of the art in GC has moved in the last 15 years since this paper was written, and they might have trouble beating modern techniques, but that's speculation.
Escape analysis is a degenerate region analysis, and many of the cases where it fails because it's not general enough would be handled quite well by regions.
Note that they solved their problems using exactly the features being added in Java 9.