
Cross-Language Compiler Benchmarking - ingve
http://stefan-marr.de/papers/dls-marr-et-al-cross-language-compiler-benchmarking-are-we-fast-yet/
======
trishume
Interesting that Crystal did worse than Java even on many non-GC-stress-test
benchmarks. I wasn't expecting that.

Some theories as to sources of the slowness:

\- The Boehm GC is so much worse that it accounts for bad performance even on
benchmarks that don't stress it.

\- Crystal's polymorphic calls don't use inline caches and JIT speculation so
they aren't as fast as Java's.

\- LLVM isn't as good at bounds check elimination as Java because C++ doesn't
need it.

\- The Crystal code used reference types where it normally would use value
types, and Java really has a better code generator for this sort of object-
oriented code.

~~~
mk7
From:
[http://www.oracle.com/technetwork/java/whitepaper-135217.htm...](http://www.oracle.com/technetwork/java/whitepaper-135217.html)
"The Server VM contains an advanced adaptive compiler that supports many of
the same types of optimizations performed by optimizing C++ compilers, as well
as some optimizations that cannot be done by traditional compilers, such as
aggressive inlining across virtual method invocations. This is a competitive
and performance advantage over static compilers. Adaptive optimization
technology is very flexible in its approach, and typically outperforms even
advanced static analysis and compilation techniques."

So HotSpot may detect "hot spot code" in runtime and aggressively inline it -
this is what LLVM can not do, becuase it doesn't run in runtime...

------
mk7
It would be interesting to see the comparison with non-GC implementations of
the same algorithms/benchmarks - for example in C...

~~~
smarr
A few of the smaller benchmarks have been ported to C++ already:
[https://github.com/smarr/are-we-fast-
yet/tree/wip/cpp/benchm...](https://github.com/smarr/are-we-fast-
yet/tree/wip/cpp/benchmarks/C%2B%2B/src)

But the bigger ones are not yet (so, no results yet).

One of the important questions to answer first is how the mapping should be
done. A naive version using new/delete/smartpointers is going to have
performance issues. Other options would be to use arena allocators and
completely remove memory management overhead from the equation. Depending on
what comparison/C++ usage scenario is desired, both options would be useful.

