C2 already does escape analysis, and scalar-replacement-of-aggregates to turn objects that don't escape a compilation unit into dataflow edges, which if possible is far better than the stack-allocation in this article because you'd rather not write out to the stack if you don't have to, and you'd like unused parts of the object to disappear. Graal then adds partial-escape-analysis to float reification of an object to the last-safe-moment.
What this does on top of that is then to cover the corner case where an object would be safe for SRA, except something requires the object to have the same layout as on the heap. I think a concrete example of this is some intrinsics and merges where an object needs to be heap-allocated on just one branch, so that code following the merge can have a full object in both cases and not care that one has been allocated on the stack.
The key to avoiding confusion - 'allocating on the stack' in this case means literally allocating the full reified object in stack memory, whereas in C2 at the moment when people say 'stack allocation' informally they mean SRA.
C2 already removes reference type allocations when they don't escape a compilation unit, which is something I think C# doesn't yet do at all for some reason I don't really understand.
Yes, and I'd also add that SROA typically enables lots of optimizations since it lets you treat object fields as though they were ordinary local variables.
I'm a little surprised that see such noticeable performance improvements. Obviously benchmarks etc etc, but I didn't expect these cases to be quite so common.
Recentish versions of .NET (Core) have introduced ValueTuple and ValueTasks. These types avoid the heap allocations with Tuple and Task.
Hopefully Microsoft think this makes enough real world difference to justify the pain of two sets of types.
Or asked differently: what is the difference between an object with properly overridden hashCode() / equals() methods and which is effectively only being used as a data container, and a struct?
For final classes unpacking into registers might work, but at least in the jvm you can subclass final classes via reflection. If I understand correctly this avoids repacking when calling methods that expect the normal memory layout by actually allocating the the normal object layout - just on the heap.
Most Java call sites are in practice monomorphic. They may look virtual according the language spec, but that isn't how they're really implemented.
It cannot always without affecting the semantics. An array of structs will still have to be an array of references in most cases, for example. The compiler isn't sufficiently smart to figure out such things beyond trivial examples.
In General it applies to any optimization: the compiler can figure out a subset of optimizations the programmer can. That's why not giving the control of memory layout to the programmer is objectively bad idea.
Also, when you need performance, not relying on compiler to figure out the optimizations is easier than ensuring the optimizations happen. Compilers are quite unpredictable.
Adding value types to a VM or language means exposing that sort of feature in a way that can be used by programmers and provides the sort of safety guarantees they are used to.
I wonder what others lovely optimizations/improvements they will bring :}
Especially which optimizations from the C# world will they transpose to the JVM world.
According to the benchmarckgame C# is the fastest on earth JITed language
It helps that the team has done a lot of optimizations for regex and other parts of the standard lib in the past few months.
You stopped without providing any alternative data beyond — .net is slower because I say it's slower.
You stopped without telling us about the benchmarks game C# program which uses System.Text.RegularExpressions
Long running VM's like Hotspot in "server" mode do a lot of expensive optimizations over time. Benchmarks game makes no attempt to warm up these VM's, so it doesn't actually measure how fast a VM is in practical hosting scenarios.
Anecdotally, I have heard that Java is still faster than C# in most benchmarks when both VM's have had time to do a full JIT. I've also heard that Golang is much slower than C# and Java when their JITs are allowed to fully warm.
I've been meaning to build a "warmed up" version of benchmarks game specifically for testing VM languages but never got around to it. If someone else wants to pick up the torch I would be eternally grateful!
In a couple benchmarks Java would probably be faster than C++ if the JVM was allowed to warm up
Please be specific.
JMH timing for that spectral-norm program was 0.175s to 0.283s faster than the 4.29s elapsed time (16s cpu time).
That's not 10-20% it's 6.6%.
At best that might put the best Java spectral-norm program a little faster than the best Haskell program.
At best that might mean the best Java spectral-norm program took 2x (twice as long as) the best C++ program.
> … would probably be faster…
Please take those tiny tiny programs and JMH and make your own measurements — you might believe measurements you make yourself.
This is about stack allocation of objects where safe (i.e. when you can prove they don't escape the local scope) for Hotspot, something that already exists in J9 (IBM's JVM) I think.
There is equivalent work for .net core here: https://github.com/dotnet/runtime/blob/master/docs/design/co...
(So the title is somewhat accurate, but you have to do some digging)
> In .NET instances of reference types are allocated on the garbage-collected heap.
> If the lifetime of an object is bounded by the lifetime of the allocating method, the allocation may be moved to the stack.
But my understanding of Chicken is limited - maybe you were asking rhetorically and you knew more?
C2 has two separate spaces in memory- one for the stack and another for the GC nursery (and the rest of the heap). It pops the machine stack when a Java function returns, and clears out the nursery as part of running the GC.
The difference is that the C2 approach to managing the stack (shared by C, etc.) loses some flexibility to increase performance. When the machine stack maps 1:1 with the call stack like that, objects that are allocated on the stack incur no GC cost- they are always freed automatically on function return simply by moving the stack pointer, never kept alive and promoted, the way they can be in Chicken Scheme's approach.
Also worth noting, I believe .net Core doesn't necessarily stack allocate value types either, seems a lot of conditions can make it unsafe as well, like a closed over value type that could escape the local scope, so value types arn't always stack allocated either. Only done when it is safe to do so.
You mean, captured in closure?
The linked content is only about Java.