
Impossible Java - barahilia
https://barahilia.github.io/blog/computers/2017/03/26/impossible-java.html
======
teraflop
Fun fact: in Java versions 5 and 6, it was actually possible to write valid
Java source code with overloaded return types!

The trick is that generic types in Java are subject to type erasure at
runtime. Due to an oversight, it was possible to declare a class with methods
like:

    
    
        String foo(List<X> l);
        double foo(List<Y> l);
    

which would be erased to:

    
    
        String foo(List l);
        double foo(List l);
    

At runtime, even though the type information for your List was no longer
available, the compiler would be able to locate the correct method using the
method signature stored in the caller's bytecode, giving the _appearance_ of
return-type dispatch. Technically this violates the Java language spec, and
javac 7 was updated to be stricter and prevent this sort of code:
[http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6182950](http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6182950)

I guess it's ultimately not that mysterious, but I first encountered it "in
the wild", and ended up scratching my head for a while before I figured out
why our code suddenly stopped compiling when we upgraded the JDK.

~~~
jxi
Interesting. Then, why isn't the bytecode always stored with the caller, that
way you _can_ have return-type dispatch?

~~~
josefx
The Java bytecode stores it, however the Java language normally does not
expose it. Having the compiler select the correct overload based on return
type would most likely add a lot of complexity to the language itself ( have
you seen the current overload resolution rules? ) without much benefit.

I think the compiler actually has to work around that when you narrow the
return type of an overriden function: A method Object Base::get() overriden
with Integer Child::get() will result in an additional compiler generated
Object Child::get() in the bytecode.

~~~
jdmichal
Yep. They're called "bridge" methods. Though, interestingly, there's only one
point that they're mentioned in the language spec, specifically in the case of
erasure. Even though they were needed before erasure for exactly this reason.

[https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.htm...](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#d5e26528)

------
nebulous1
tl;dr Java needs to be able to decide which overloaded method implementation
to use at compile time, meaning you can't differentiate between method
implementations based on return type alone. However, when compiled to
bytecode, each method and method call includes the return type as part of the
method identification, so Java bytecode _is_ actually capable of
differentiating between implementations with the same name based on return
type alone. This fact is used by bytecode obfuscators, and can lead to bugs in
decompiled code if the decompiler doesn't account for it.

~~~
masklinn
> Java needs to be able to decide which overloaded method implementation to
> use at compile time, meaning you can't differentiate between method
> implementations based on return type alone.

That's not correct. Rust has no issue statically dispatching based on the
return type.

Java-the-language does not allow it so it does not have to deal with calls
which don't use the return value e.g.

    
    
        int getSomething()
        String getSomething()
    

If the return value is not used, you have to explicitly disambiguate this call
somehow, and Java provides no way to do so.

~~~
bradleyjg
Even if you have a language that forces you to assign the return value, what
does the compiler do in a situation like:

    
    
      Integer getSomething()
      String getSomething()
    

when the calling site is:

    
    
      Object something = obj.getSomething();

~~~
tazjin
The compiler would tell you that it can't disambiguate that and fail. It's a
type error.

~~~
bradleyjg
I guess that makes sense. It is similar to:

    
    
      void putSomething(Integer i)
      void putSomething(String i)
    

and

    
    
      obj.putSomething(null)
    

which I think throws a compiler error in java.

------
bmc7505
I was surprised to learn that in Kotlin, it is possible to disambiguate
overloaded functions based only on their return type. I had no idea the JVM
even supports such semantics. [1]

[1]:
[http://stackoverflow.com/q/42916801/1772342](http://stackoverflow.com/q/42916801/1772342)

~~~
mbel
According you your link JVM does not support this feature. It's implemented by
Kotlin itself. I guess it's probably some kind of name-mangling scheme.

~~~
masklinn
According to the link Java/Javac does not support the feature.

The JVM does, `bar(foo: List<String>): String` can be compiled to
`bar(Ljava/util/List;)Ljava/lang/String;` and `bar(foo: List<Int>): Int` to
`bar(Ljava/util/list;)Ljava/lang/Integer;` or somesuch, there is no ambiguity
at the bytecode level.

In fact, the documentation for Class#getMethod specifically outlines this
issue[0]:

> Note that there may be more than one matching method in a class because
> while the Java language forbids a class to declare multiple methods with the
> same signature but different return types, the Java virtual machine does
> not.

[0]
[http://docs.oracle.com/javase/8/docs/api/java/lang/Class.htm...](http://docs.oracle.com/javase/8/docs/api/java/lang/Class.html#getMethod-
java.lang.String-java.lang.Class...-)

~~~
mbel
The article describes Dalvik which is different from JVM, so it's not really a
proof.

But you are actually right. It looks that JVM bytecode includes full function
signature in function invocation:
[https://www.ibm.com/developerworks/library/it-
haggar_bytecod...](https://www.ibm.com/developerworks/library/it-
haggar_bytecode/) (if I'm reading examples correctly).

~~~
masklinn
> The article describes Dalvik which is different from JVM, so it's not really
> a proof.

My comment describes the actual JVM, hence having linked to official Java/JVM
documentation.

~~~
mbel
Sorry, I misread it.

------
mooman219
Not only is this possible, as stated by the article, this level of overloading
is incredibly useful for in a number of areas and has been used for a long
time.

One use case is API backwards compatibility. If your API wants to change the
return type of a function, say from int to double, but also wants to maintain
binary backwards compatibility, you can do that. See OverMapped [
[https://github.com/Wolvereness/OverMapped](https://github.com/Wolvereness/OverMapped)
].

Obfuscation is another area and ProGuard employs this to make decompliling
more difficult iirc.

------
matharmin
This is about Dalvik bytecode format, but the same applies to standard Java
bytebode files. Practically any obfuscated Java code will have this, which
makes reverse engineering much more difficult without the tools to handle it.

~~~
HighlandSpring
Why is it not as simple as mapping the method call instructions to
returntype_methodname format? Am I missing something?

~~~
Flowdalic
What if the call site does not use the returned value?

~~~
masklinn
The bytecode encodes that information, the method identifier includes name,
parameter types and return type.

------
DorothySim
The same thing exists in .NET IL where you can overload methods based only on
return values (among other interesting things like modopt/modreq [0] etc.).

[0]: [http://stackoverflow.com/a/5294456](http://stackoverflow.com/a/5294456)

------
Gaelan
> Any compiler of sound mind and memory will issue an error

GHC would beg to differ.

~~~
masklinn
Also rustc, though it doesn't quite have overloading in the Java sense:
[https://is.gd/s0oF79](https://is.gd/s0oF79)

~~~
floatboth
GHC doesn't either. Rust traits are heavily inspired by Haskell typeclasses…
so it's like in your example.

------
mnarayan01
It seems like this is simply a disassembler error (albeit an understandable
one). Am I missing something?

Edit: Based on the responses below, I guess the point is that the disassembler
can't generate Java code that will "naively" (wrong word, but I can't think of
a better one) generate the same output. Notable (I assume) in that name
munging would be problematic outside the current compilation unit.

~~~
burkaman
No, Java class files usually include the names of variables and functions, so
this isn't the disassembler's fault. The class file actually had two functions
with the same name. You could certainly implement an anti-obfuscation layer to
detect stuff like this, but I wouldn't call it an "error" as is.

~~~
ciniglio
I think it's a common expectation that a disassembler should provide output
that is valid to be compiled, and that therefore this is an error.

~~~
burkaman
Sure, but it's also expected that it should provide output that could be
compiled to produce the input, and in this case it's impossible to satisfy
both those constraints. The best thing would probably be to leave a comment in
the generated source code explaining the problem, and provide an option to
rename overlapping functions.

------
cremp
It should be noted that the JVM allows for things that javac does not. This is
one of those things, and describes it accurately, albeit in the android java
environment, however it applies to the standard JVM too.

One of the more fun things I've done is use the ASM and BCEL libraries to make
(and unmake) these kinds of manipulations (manipulations that javac won't let
you do.)

------
anonymousDan
I believe the technical term for the correspondence between what it is
possible to compile and what is valid bytecode/machine code is fully abstract
compilation. It's an interesting concept with many interesting implications
(e.g. for security). In the past at least there were various examples of Java
programs that were illegal in the language but nonetheless could be created
directly as bytecode and would be loaded by the JVM. This obviously becomes a
security problem if your program loads bytecode dynamically and makes
assumptions about its capabilities at the language level as opposed to the
bytecode level.

~~~
drdrey
If you load untrusted code dynamically, it strikes me as wrong to assume
anything about its capabilities. Even more so "at the language level".
Untrusted code can do anything unless you sandbox it.

~~~
anonymousDan
You're right but I guess it's perhaps easy to overlook, e.g. that if you
decompile/disassemble a valid bytecode program it might give you a program
that is not valid at the source level. Some interesting examples and
discussion here: [http://lambda-the-ultimate.org/node/5364](http://lambda-the-
ultimate.org/node/5364)

------
tenkeyless
This is a well known problem in the field of decompilers and disassemblers. It
figures that the pseudo-code it outputs for generic compilers is pretty good,
but when encountered with a man-made assembly or byte-code, they go places.

------
_old_dude_
Even the Java compiler uses that trick, by example with a bridge method.

    
    
      public static void main(String[] args) {
        class Fun implements Supplier<String> {
          public String get() { return null; }
        }
        
        Arrays.stream(Fun.class.getMethods())
          .filter(m -> m.getDeclaringClass() == Fun.class)
          .forEach(System.out::println);
      }

~~~
kuschku
why not use .getDeclaredMethods()?

~~~
_old_dude_
no real reason :)

------
0x0
Another oddity to think about in java source code: In regular Java, all
objects extend java.lang.Object, including java.lang.Class. So how do you
bootstrap building java.lang.Object from source?

~~~
seanmcdirmid
java.lang.Object is just part of the VM. Same thing with native methods.

~~~
0x0
While many of the methods in java.lang.Object have a native implementation,
there are still other methods that do sport a pure java implementation:
[http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/...](http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/java/lang/Object.java)

~~~
seanmcdirmid
Sure, but...if you look at the JVMS, there are a lot of special cases
concerning java.lang.Object to help with bootstrapping.

------
reitanqild
I call clickbait:

Author admits this is not valid Java. It is not even compilable.

If I read correctly it is just artifacts from a partially sucessful decompile.

Interesting and this discussion is interesting but this is not and have never
been valid Java.

------
ipsum2
never mind.

~~~
sctb
We detached this subthread from
[https://news.ycombinator.com/item?id=13978371](https://news.ycombinator.com/item?id=13978371)
and marked it off-topic.

