
Interfacing with native methods on Graal VM - wippler
https://cornerwings.github.io/2018/07/graal-native-methods/
======
kjeetgill
Awesome. I wonder how well this works on a stock JDK10 using graal.

Whenever I see a speed boost to do what is conceptually the same thing I'm
always curious where the fat was cut. What did we give up? You can dump the
resulting assembly with -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly and
diff might be revealing.

My hunch is that the line from the tutorial: `@CFunction(transition =
Transition.NO_TRANSITION)` makes all the difference. Explanation of
NO_TRANSITION from [0]:

No prologue and epilogue is emitted. The C code must not block and must not
call back to Java. Also, long running C code delays safepoints (and therefore
garbage collection) of other threads until the call returns.

Which is probably great for BLAS-like calls. This lines up with my
understanding from Cliff Click's great talk "Why is JNI Slow?"[1] basically
saying that to be faster you need make assumptions about what the native code
could and couldn't do and that generally developers would shoot themselves in
the foot.

[0]:
[https://github.com/oracle/graal/blob/master/sdk/src/org.graa...](https://github.com/oracle/graal/blob/master/sdk/src/org.graalvm.nativeimage/src/org/graalvm/nativeimage/c/function/CFunction.java)
[1]:
[https://www.youtube.com/watch?v=LoyBTqkSkZk](https://www.youtube.com/watch?v=LoyBTqkSkZk)

~~~
Twirrim
A team I was on in the past had a well known bottleneck for performance on the
most performance critical component. It was one that couldn't possibly be
avoided or minimised. It was one called with high frequency, and wall clock
wise, didn't take too long.

"JNI is slow", being the conventional wisdom, and knowing just how frequent
the calls would be, people had ignored it as an option.

Randomly one of the devs who was most bothered by the bottleneck, had an hour
spare and threw the conventional wisdom out the window and dropped in JNI
calls to an standard (highly optimised) library and re-benchmarked. 40%
performance boost. Further experiments found that "JNI is slow" isn't as true
as conventional wisdom quite had it.

------
repolfx
There's an effort to bring a more modern FFI to Java that works similar to the
one described in the article, called project Panama. It has tools to convert C
header files into the equivalent annotated Java definitions and is intended to
help improve performance as well.

You can follow along here:

[http://mail.openjdk.java.net/pipermail/panama-
dev/](http://mail.openjdk.java.net/pipermail/panama-dev/)

The same project is also adding support for writing vector code in Java (SSE,
AVX etc).

~~~
agibsonccc
Disclaimer: I'm affiliated with a semi competing project to panama called
javacpp:
[https://github.com/bytedeco/javacpp](https://github.com/bytedeco/javacpp)

I can say for a fact that panama is not seriously targeting this space. We
implement a ton of that native code today that works with c++ _and_ actual
android _today_. We also handle gpus. Project panama is only targeting c, and
even then will only do it a cross platform non committal fashion. They aren't
doing it the way they should be in order to properly target native vectorized
code.

We know this from experience, because this is all we do:
[https://github.com/deeplearning4j/deeplearning4j](https://github.com/deeplearning4j/deeplearning4j)
[https://github.com/bytedeco/javacpp-
presets](https://github.com/bytedeco/javacpp-presets)

We tried seeing if we could get some of this work in to the JDK, but their
goals fundamentally compete with what it takes to get vector math to be fast.
It's also not nearly as ambitious as it needs to be to handle real world
tensor workloads.

~~~
bitmapbrother
>Project panama is only targeting c, and even then will only do it a cross
platform non committal fashion

John Rose of Oracle:

 _Panama is not just about C headers. It is about building a framework in
which any data+function schema of APIs can be efficiently plugged into the
JVM. So it 's not just C or C++ but protocol specs and persistent memory
structures and on-disk formats and stuff not invented yet. We've been
relentless about designing the framework down to essential functionality
(memory access and procedure calls), not just our (second-)favorite language
or compiler._

 _The important deliverable of Panama is therefore not Posix bindings, but
rather a language-neutral memory layout-and-access mechanism, plus a language-
neutral (initially ABI-compliant) subroutine invocation mechanism. The
jextract tool grovels over ANSI C (soon C++) schemas and translates to the
layouts and function calls, bound helpfully to Java APIs with unsurprising
names. But the jextract tool is just the first plugin of many._

 _We do look forward to building more plugins for more metadata formats
outside the Java ecosystem, such as what you are building._

 _In fact, I expect that, in the long run, we will not build all of the
plugins, but that people who invent new data schemas (or even data+function
schemas or languages) will consider using our tools (layouts, binder, metadata
annotations) to integrate with Java, instead of the standard technique, which
is to write a set of Java native functions from scratch, or (if you are very
clever) with tooling. The binder pattern, in particular, seems to be a great
way to spin repetitive code for accessing data structures of all sorts, not
just C or Java. I hope it will be used, eventually, in preference to static
protocol compilers. The JVM is very good at on-line optimization, even of
freshly spun code, so it is a natural framework for building a binder._

>They aren't doing it the way they should be in order to properly target
native vectorized code.

Which is interesting since Intel is the one contributing the majority of the
vector code changes.

~~~
agibsonccc
Yes that's what I stated above. I've also stated that I haven't just read the
news. We've talked to that team physically. Being language/platform neutral
does not mean it is going to fulfill most use cases people would have for c
bindings. Java tends to be "good enough" for a lot of use cases out of the
box. It might help a bit with libraries like netty and memory management, but
it's not going to work on real world math code which, as I stated, is our main
use case.

That codegen isn't going to match what you need to do for real speed on cpus
or gpus when writing vectorized math code.

Re: his last point. That's exactly what we talked to that team about. We don't
feel those tools are going to work for real world use cases. We already do the
codegen and auto bindings/mapping ourselves in addition to the memory
management ourselves.

------
Reason077
Back in the day, GCC's Java native compiler "GCJ", had an alternative native
method interface called CNI.

GCC recognized _#extern "Java"_ in headers generated from class files. You
could then call (gcj-compiled) Java classes from C++ as if they were native
C++ classes, as well as implement Java "native" methods in natural C++.

The whole thing performed a lot better than JNI since it was, more or less,
just using the standard platform calling conventions. Calling a native CNI
method from Java had the same overhead as any regular Java virtual method
call.

Ultimately, GCJ faded away because there wasn't a great deal of interest in
native Java compilation back then, and too many compatibility challenges in
the pre-OpenJDK days. But it's interesting to see many of it's ideas coming
back now in the form of Graal/GraalVM.

~~~
pjmlp
There was interest in native Java compilation, not in doing the work for free.

Most third party commercial Java SDKs do have support for native compilation,
specially on the embedded space.

Around 2009 GCJ suffered an exodus of developers to OpenJDK.

