Hacker News new | comments | show | ask | jobs | submit login
Java Reflection, 1000x Faster (norswap.com)
139 points by norswap 5 months ago | hide | past | web | favorite | 55 comments

> The simple approach is to simply add a children() method to the interface and implement it in every kind of node. Very tedious, and boilerplaty as hell.

The best approach without boilerplate would be to generate this at compile time, not runtime. You could use an approach like in http://hannesdorfmann.com/annotation-processing/annotationpr... to create at compile-time, using reflection, a ChildWalker class that accesses the relevant properties for every class annotated with @Node. At runtime, the JIT should get this down to a vtable lookup and field lookups - just about as optimized as you can get.

EDIT: Always remember https://xkcd.com/1205/ - if it would take 5 seconds to write each children() method, and you write no more than 5 Node classes per day (amortized), then you shouldn't spend more 12 hours writing the code for this blog post ;) Unless, of course, that was the point all along!

Seeing all these comments about code generation at compile time makes me wonder if there's an overlap between people who use/write these kind of tools for complex languages and those who post 'what's so great about (lisp) macros?'.

eg: JSX users.

Although not a realistic option for everyone, I find c++ templates to be simpler than java annotation processors for type-safe compile-time code gen.

An annotation processor does not seem to be a good fit here. Annotation processors can only generate new classes, not modify existing ones. So you would have to make all classes abstract and then generate non-abstract ones that have a #children() method and refer to the generated classes everywhere in your code.

There are ways if you're not too worried about using private compiler APIs: http://scg.unibe.ch/archive/projects/Erni08b.pdf

I'm really not a fan of these "clever" hacks that rely on compiler internals. That doesn't even work on ecj.

Developing, debugging, using an annotation processor is opposite of productive. Not recommended.

I wrote an annotation processor for [0].

If you use Kotlin instead of Java it's way better because no nullability-issues and it's super-useful standard-library.

And once you've figured out how to hook it into gradle, IDEA etc., which admittedly cost me an hour or so, you can reuse that knowledge for every annotation-processor to come.

I made it a habit to set myself a time-limit EVERY time I start touching the build-process. After the limit has passed it must be faster, more automated, or in any other way more useful, or I will revert the changes.

[0] https://medium.com/@spitzwegerich/generated-json-serialisati...

It really depends on the annotation processor and what kind of code you are trying to generate. In the author's case it shouldn't take that long nor would it be hard to develop and debug.

I think it depends more on your IDE playing ball with the code generator and custom build steps. Most do an awful job.

I'd love one that I could just point to a build tool/script instead of creating it's own.

Also I've worked places with this magic and no induction or documentation or time for anyone to explain anything. The more behind the scenes hard to discover stuff you implement the better the docs and onboarding needs to be. It can't just be done to fulfill a architect astronaut fantasy.

I don't care so much about the documentation, as long as it's part of the build script (it's own documentation) and not checked into source control. By not checking it into source control it won't be edited, there won't be a mix of generated and custom code. By being part of the build process you avoid the generation tool and scripts being lost to the sands of time.

I like the build process to be committed personally for build reproducibility. It can be in a seperate more secured repository though.

Or, heck, just hack together a little script to generate the code and hook it into your build system. You don't need anything as fancy as an annotation processor. A handful of Python code could accomplish just as well.

But how do you access the original AST in your python?

There is no original AST. The Python outputs the AST classes, so it is the source of truth.

I'm reading Annotations on Java-Code and generating classes from them, so I need to parse Java first.

Indeed, code generation was going to be my permanent solution, until this turned out fast enough not to bother!

Seems like I'm doing fine by xkcd, at least if you remove the time to make a simpler version for the blog post =) And the learning has some value too anyhow.

Seems to me this is approaching this totally the wrong way. First off, the current implementation is just looping through all the methods of the implementing class and the first one that matches a Node or [Node] return type is supposedly its children? That seems extremely brittle.

The fact that you have 100 classes implementing this is a good sign you should turn the design on its head. Why not just have a single tree structure built using generics?

edit: also I would point out that you're not really making reflection any faster, just your code

On top of that, he acknowledges that there is a much, much simpler solution, known from the dawn of Java, and this is to have a polymorphic children() method.

But that would (gasp!) require adding those implementations. Which is tedious (it is not with a bit of inheritance or some other technique but ok). OK, so, it requires a bit of work. But those methods would take a few (or a dozen) nanoseconds to dispatch; you can't go faster than that. What's the alternative? To build your brittle infrastructure? Hmmm, I must be missing something...

This is assuming you have full control over all the implementations. I see this frequently in large Java and Scala code bases in enterprise, you have a generic library you built awhile back and now has all sorts of users in all different places that you may not even have source code access to, only jar access though Nexus or Artifactory. Then you need to do some operation like OP notes but you can't expect everyone to commit your code immediately or even at all. Thus OP is left with only reasonable solution detailed.

So your complaint is that it is hard for users if you make a breaking change in your library, big surprise. This is why breaking changes in libraries should only be done with utmost consideration, preferably after first deprecating the usage.

And if you make a breaking change to the interface, you better be damn well breaking it at compile time and not hiding it with reflection shenanigans.

In the example of the OP, it's easy to think of an example that would totally be broken by the author's reflection approach, while the user has no warning at compile time and no reasonable way of knowing that what he's doing isn't allowed:

    class MyWeirdNode extends Node {
        private final int i = 1;
        public IntNode asIntNode() {
            return new IntNode(i)

That's where client provided implicit converters in Scala are actually useful magic

I think you're confused about the "heterogeneous tree" parts. All implementation of Node have different children, held in different fields with different names, and varying in numbers. This can't be simplified by inheritance, you need to write an implementation per class.

> The fact that you have 100 classes implementing this is a good sign you should turn the design on its head. Why not just have a single tree structure built using generics?

Does anyone know why gui toolkits never do this? Every one seems to follow an OO model where each widget owns it's own children. Even in GTK they hacked in an OO system to do it instead of having a separate tree.

Because almost every GUI component is responsible for sizing and positioning its own children. A separate tree might be useful for some other optimisations I can't think of, but it would definitely be a pain for components, even those with a fixed number of children (as opposed to "container" components).

Because GUIs handle data providers and you have to provide your own adapters to those (so it's even MORE boilerplate)

I'm not seeing how I can use some generic tree class without adding code for each Node subclass, which is exactly what I'm trying to avoid.

At the least, I'd have to populate the children list of that class (which is more or less what children() would have done). The identity of each child (what it represents), such as "condition of a while statement" or "body of the else part of an if statement" must be preserved.

Also, the issue with methods really isn't, because these classes are really POJOs. Getters and methods inherited from Object is pretty much all they have, so I'm not worried about accidentally capturing a method that is not a getter.

And well, this make reflection calls faster by replacing them (probably) with the equivalent plain code.

If you are using reflection and encountering performance problems, the correct approach is pretty much always to write an annotation processor and generate that code instead.

Or generate code without using annotation processor

ReflectASM is the fastest, most general purpose reflection solution. It uses runtime bytecode generation.


When I needed something more custom, I've done the bytecode strategy a few times, using ASM library or equiv. But the cost of change (brittleness, aggravation) is pretty high.

Now I just write Java source and use the JDK's built-in javac to generate the bytecodes.

Some people like StringTemplate. http://www.stringtemplate.org For whatever reason, I still prefer Velocity. Template languages are mostly evil, and variations of terrible, so I just stick to what I know.

Annotations are never the correct answer. For any use case, purpose.

Edit: Whoops. May have spoke too soon. JDK 8's MethodHandle and some other features (inlining simple getters) may have mooted ReflectASM. https://github.com/EsotericSoftware/reflectasm/issues/23 I may have to retool some of my older code.

JavaPoet is a relatively nice library for generating Java sources: https://github.com/square/javapoet

Thanks. I tried JavaPoet, once apon a time.

I'm just a simple bear. I can't keep all the details straight.

One strategy I use to cope is round tripping, as a way to confirm I get the expected results.

My current approach to code generation: painstakingly hand craft the desired source code output. Then turn that into a template (progressive refinement). Then compare (diff) generated output to manual. Iterate. Many times that means adjusting target to make it easier to generate.

Using bytecodes (class file) as the target output adds complexity and more steps. I've been reversing bytecodes for a long time, manually using javap, the decompilers (which are now quite good), and using ASMs code generator plugin for Eclipse (super clever). But it's too fussy, compared to the Java source code strategy.

> Annotations are never the correct answer. For any use case, purpose

That's a pretty absurd claim, especially given their popularity.

What do you use to annotate elements present in a Java source with metadata?

Why XML of course!

Didn't look too deep, but this might be due to the fact that call sites are memoized for invoke calls as part of the invokedynamic improvements (but really MethodHandle::invoke and MethodHandle::invokeExact are where the magic is as is mentioned by the JVM spec IIRC).

Regardless, MethodHandle invocations are always going to be faster than reflection Method invocations. A primary reason is that the parameters for invoke and invokeExact are not subject to the same boxing and runtime-validation requirements as reflection.

> I had an interface (representing a tree node)

No you didn't. An interface representing a tree node would have a children() method.

Why? I see why it might be a good idea depending on what problem you are trying to solve, but I don't see why anything has to have any references to its children in order to be a tree node.

I assume the JIT isn't optimizing until the functions execute a certain number of times. Given the speed difference maybe it isn't even natively compiling at first and using the interpreter.

I don't know about Java but in .NET there's a call you can make to force the dynamically generated lamba expression to be compiled immediately.

Of course if you're being really adventurous you can use IL emit to build a function up from IL opcodes.

Yes. The author didn't note his testing methodology. JMH (Java Microbenchmark Harness) is the gold standard for executing Java benchmarking tests:


It takes care of details like JVM warm-ups. ensuring sufficient invocations for JIT compilation to have occurred, etc.

Although sometimes the performance when interpreted before warm-up is important.

Testing with CompilerControl.Mode.EXCLUDE in JMH is still more reliable than testing manually, when you don't know if it got compiled or not.

Interpreted code is a case where method handles should be much faster than the generated accessor classes used by reflection - lambdas and method references are faster than an inner class implementing a functional interface before jit compilation. Afterwards, performance tends to be sameish.

That's the kind of solution I've used before when performing reflection in a tight loop. The important thing there is to cache the result of the compiled expression or the dynamic IL code because the compile time/emit is slower than a single use of reflection.

> The important thing there is to cache the result of the compiled expression or the dynamic IL code because the compile time/emit is slower than a single use of reflection

I came across some code a few weeks ago that skipped this step. It wasn't called remotely often enough to justify the complication in the first place (low volume internal web app) but it didn't cache the result so it ended up making things slower. Thanks to the wonders of RenderAction in MVC it compiled the dynamic IL 20 times per page view as well.

In the past, I've used Introspector API [1] for property reflection. It's part of the JDK, and provides a lot of the caching this article is talking about. Doubt this would provide the same level of improvement as adding the direct children() method, or even ASM based approaches, but it's certainly faster than naked reflection.

[1] https://docs.oracle.com/javase/7/docs/api/java/beans/Introsp...

There's another, and IMO, a better way to make Java reflection faster. It's to disable checks on reflection calls. Just call setAccessible(true), and you will notice substantial improvement.

Isn't setAccessible required before you can interact with fields?

If they are private, yes.

he does that

What I do for these situations is do the reflection as a unit test to confirm that the hand written children() is correct.

When the unit test fails for a node type it prints what it should be and you just copy and paste that code for children() should be.

This is also nice for the case where nodes have some sort special children(). You can make the unit test ignore those.

I wonder if this could be used by JVM languages that sometimes fall back to reflection, like Clojure. Transparently profile the reflection calls and recompile the hot ones?

Clojure provides a var called warn-on-reflection that can be set to true during dev, and all (or a vast majority of) reflective calls can be optimized away by adding type hints wherever it warns. This is to say that Clojure devs get rid of reflection instead of trying to make it faster.

Sounds like doing it automatically, instead of requiring code changes, would save a lot of work and speed up many programs where the programmers haven't gone through the trouble of doing this.

I wonder if the clojure devs use these techniques. IIRC clojure relies a lot on reflection.

Clojure devs get rid of reflection by adding type hints where required. The compiler has a flag to help with this, called warn-on-reflection.

C# reflection is also slow, and even slower in Unity3D with the IL2CPP ahead-of-time compiler, where there is no just-in-time compiler or System.Reflection.Emit to dynamically generate code at runtime (which is impossible and prohibited on iOS, for example).

Here are the limitations of the Mono runtime:


However, there are ways of compiling the little snippets of the code you need ahead of time, then plugging them together with delegates at runtime, which avoids much of the overhead.

Much of the cost is the overhead of converting from unknown types to specific types. So you can pre-compile loosely typed adaptor thunks that take generic "object" arguments, convert them to the required type, and directly call strongly typed delegates (which you can plug into the adaptor, so adaptors can be used for any number of compatible delegates).

Check out how the following code uses FastInvoke, ToOpenDelegate, Delegate.CreateDelegate, Expression.GetActionType, and Expression.GetFuncType.

It may be that some of these general techniques could be applied to Java.

Faster Invoke for reflected property access and method invocation with AOT compilation:


The bane of the iOS programmers life, when working with reflection in Mono, is that you cant go around making up new generic types to ensure that your reflected properties and methods get called at decent speed. This is because Mono on iOS is fully Ahead Of Time compiled and simply cant make up new stuff as you go along. That coupled with the dire performance of Invoke when using reflected properties lead me to construct a helper class.

This works by registering a series of method signatures with the compiler, so that they are available to code running on the device. In my tests property access was 4.5x faster and method access with one parameters was 2.4x faster. Not earth shattering but every little helps. If you knew what you wanted ahead of time, then you could probably do a lot better. See here for info.

You have to register signatures inside each class Im afraid. Nothing I can do about that.

So to register a signature you use:

    static MyClass()
         //All methods returning string can be accelerated
         DelegateSupport.RegisterFunctionType<MyClass, string>();         
         //All methods returning string and taking an int can be accelerated
         DelegateSupport.RegisterFunctionType<MyClass, int, string>();    
         //All methods returning void and taking a bool can be accelerated
         DelegateSupport.RegisterActionType<MyClass, bool>();             
Then when you have a MethodInfo you use the extension method FastInvoke(object target, params object[] parameters) to call it. FastInvoke will default to using normal Invoke if you havent accelerated a particular type.


myObject.GetType().GetMethod("SomeMethod").FastInvoke(myObject, 1, 2);

You can download the source code for FastInvoke from here.


Newer version from unityserializer-ng:


Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact