
The Unsound Playground: Java and Scala's Type Systems are Unsound - theemathas
http://io.livecode.ch/learn/namin/unsound
======
derriz
It was unsound from the first version with respect to arrays. If B is a
subtype of A, then Java allows you to pass as an argument (or return) an array
of B where an array of A is specified in the type signature.

To see why this is broken, consider a method that takes an array of Animal and
sets the first element to be a Cat instance. Java's type system will allow you
to call this method with an array of Dog which will blow up at runtime on the
assignment to the array element.

I noticed this immediately when I first started using Java (version 1.0.4!) as
I had just read the classic 1985 OO type systems paper by Cardelli. I didn't
have an OO background and was surprised that a "modern" language like Java
would have this flaw in its type system given that the problem was well known
years before.

~~~
MrBuddyCasino
Is this a defect or a tradeoff? Passing arrays of subtypes seems useful to me.
I never experienced such an error, but I suspect this is quickly caught and
shouldn't be a big problem.

But then I don't know anything about type system theory.

~~~
Groxx
Without generics (this is java 1, remember) it seems like a reasonable
tradeoff. Otherwise[1] you can't have array-modifying polymorphic code at all
- how would you write a "reverse this array" function, for instance? You'd
need one for every kind of array you want to reverse. Or a "copy" function -
even if you're given a properly-sized-and-typed array to put the values in,
you can't make a function that'd work on anything except a single type.

You could force every array-function-caller to cast their args to e.g.
`Object[]`, but that seems like it'd require more work for even less safety.
Though I'd love a read-only concept in Java, for read purposes :|

[1]: I think. If I'm wrong, someone please correct me!

~~~
RHSeeger
You could define what an array accepts vs returns separately (input vs
output). Admittedly, you're pretty much into generics (good ones, not what
Java has) at that point.

Edit: Could use a better word for "good" that sounds less dismissive, but I
couldn't think of one. Java's generics are lacking in certain areas,
unfortunately.

~~~
MrBuddyCasino
Generics in Java are pretty complicated already, so much so that very few
developers fully understand them. I'm not sure the gain in complexity would be
worth the added safety.

I'd like to see examples in languages where this is possible, if there are
any?

------
aembleton
I used IntelliJ to convert the Java code into Kotlin and the compiler came up
with the following errors. It does seem to do a better job of handling types:

Error:(3, 39) Kotlin: Type argument is not within its bounds: should be
subtype of 'U'

Error:(5, 21) Kotlin: Type inference failed: 'B' cannot capture 'in T'. Type
parameter has an upper bound 'U' that cannot be satisfied capturing 'in'
projection

Error:(5, 28) Kotlin: Type mismatch: inferred type is Unsound.Constrain<U, in
T>? but Unsound.Constrain<U, ???> was expected

------
kagebe
In related news: Radu Grigore "Java generics are Turing complete" POPL'17
Preprint: [https://arxiv.org/pdf/1605.05274](https://arxiv.org/pdf/1605.05274)

------
shakna
Scala is broken by implicit constraints as well.

Makes me wonder how Dotty would hold up under similar conditions, or whether
this sort of calculus on type constraints cannot guarantee a correct return.

~~~
danarmak
The soundness of Dotty (or at least of DOT) was formally proven. In the
process they also demonstrated exactly what makes Scala 2 unsound. Links:
[http://dotty.epfl.ch/blog/2016/02/03/essence-of-
scala](http://dotty.epfl.ch/blog/2016/02/03/essence-of-scala)
[http://dotty.epfl.ch/blog/2016/02/17/scaling-dot-
soundness.h...](http://dotty.epfl.ch/blog/2016/02/17/scaling-dot-
soundness.html)

------
andrewflnr
A sound type system is one that prevents illegal operations. Using a null
reference is an illegal operation, and Java doesn't prevent this. Hasn't
Java's type system been unsound all along?

~~~
saghm
I'm not sure I understand the term "illegal operation" here. I would assume it
means "an operation that isn't allowed", but in the case anything that is
allowed (e.g. using null) would by definition not be illegal. Can you clarify
what you mean by this term?

~~~
6502nerdface
Dereferencing null is allowed at compile time but disallowed at run time;
that's what makes it an illegal operation.

~~~
saghm
Ah, so the term in this case is just used to mean "something that cases a
runtime error"? I think I misunderstood because the comment I was responding
to said "using a null reference", which to me doesn't necessarily imply
dereferencing it (although that's open to interpretation of course).

------
shamsmali
Exception in thread "main" java.lang.Error: Unresolved compilation problem:
The method upcast(Unsound.Constrain<U,B>, B) in the type Unsound.Bind<U> is
not applicable for the arguments (Unsound.Constrain<U,capture#1-of ? super T>,
T)

at Unsound.coerce(Unsound.java:13) at Unsound.main(Unsound.java:17)

~~~
aseipp
You're using the default example, but the error is mostly a red herring. The
paper explicitly acknowledges this problem, Section 3 pg. 3:

> ... As a consequence, the method type-checks. Furthermore, even though type-
> argument inference in Java is undecidable, this type argument is identified
> by javac, version 1.8.0_25, which consequently accepts and compiles the code
> in Figure 1. However, constraint solving is generally a non-deterministic
> process, and type-argument inference itself is non-deterministic [35], so
> the Eclipse Compiler for Java, ecj version 3.11.1.v20150902-1521, and the
> current compiler for the upcoming Java 9, javac build 108, fail to type-
> check this code...

Use the `Unsound9` example, which should work in multiple versions. These
discrepancies are merely due to some deficiencies in the way type inference
happens in Java, but the unsoundness principle is still valid, and you can
still construct invalid `coerce` functions.

------
shamsmali
Exception in thread "main" java.lang.Error: Unresolved compilation problem:
The method upcast(Unsound.Constrain<U,B>, B) in the I get compilation why
trying this in eclipse on Java 8 but strange when I compile it outside of it,
it goes through.

type Unsound.Bind<U> is not applicable for the arguments
(Unsound.Constrain<U,capture#1-of ? super T>, T)

at Unsound.coerce(Unsound.java:13) at Unsound.main(Unsound.java:17)

------
olliej
The headline is misleading as the type system is sound, but the general idea
of generics is that the type bindings and constraints should result in a
guarantee that there won't be any failing casts at runtime. And they're
interpreting the failure to catch all detectable errors statically as a sign
of unsoundness, which is not a correct assertion.

The Java language's half-assed generics implementation works by type erasure -
so everything is boxed to Object. Because of that erasure my guess is that
they're not handling inner class type binding correctly (maybe they're not
able to? I don't know their compiler internals - fundamentally it _should_ be
possible). If anything i'd argue that this is a compiler bug.

Anyway, the end result is a type safe (and entirely sound) runtime and
language -- if it were not sound it would be possible to write code that
_should_ trigger a type failure, either at compile or at runtime, but that
doesn't. That is what it means to have an unsound type system, for example all
C-derived languages have unsound type systems. Remember the Java has no
guarantees of full static safety (the ability to downcast Object to anything
ensures that they can never be fully statically typed), so as long as you're
guaranteed the the runtime will never let an invalid cast or conversion happen
its still correct. Another way of triggering this would be to use
introspection to instantiate an instance of a generic class. Because the VM is
unaware of generics you'll just get an Object interface through which you
could trigger the same faults.

Footnote: .NET actually gets generics right, it's not done through erasure but
actual correct type bindings. The VM is itself aware of parametric types and
instances of said types and so can statically verify the IL itself, so even if
you make an unsafe language that targets .NET the VM would reject
illformed/unsound IL, before running it: this is VM equivalent of a static
failure: it's not waiting for a dynamic cast to fail at runtime.

~~~
jhdevos
I don't see the headline as misleading at all. There are two type systems in
Java: the JVM's run-time type system, and Java's compile time type system.
It's the latter that has been shown to be unsound. As the paper notes, it's
fortunate that generics have been implemented with type erasure, because that
is what saves the JVM from being unsound, too.

As for the problem itself: it really has nothing to do with inner classes
(they are static in the example!). The problem is with this line:

    
    
        Constrain<U,? super T> constrain = null;
    

which leads the type system to assume that there is a type that is a
superclass of T, which also is a subclass of B (from the 'Constrain' class) -
even though no such type could possibly exist! The evil trick is in the 'null'
\- even though a type with the stated constraints is impossible, and so no
actual 'real' value of this generic type could exist, 'null' is part of all
types, so the compiler can't see the problem.

It really is a problem in the type system as defined in the standard, not just
a compiler bug.

~~~
olliej
Do you have a spec reference for that (not trying to be a dick - i just want
to understand what the type system believes it should be doing here as the
value being assigned to a reference should not influence the type system
behaviour)

~~~
jhdevos
This is explained in the article in a bit more detail. The point is not that
the type system does something with the actual value - the point is that it
would be impossible to create any code that results in such a value (other
than null), since then you would have to provide an actual type that is
somewhere between Integer and String - and that would obviously not pass the
type checking. Using null avoids that.

Section 4.5 in the language spec deals with parameterized types; 4.5.1 deals
with wildcards. Of course, there is no example in the specs that clearly
points out what happens in this example - otherwise this wouldn't have
remained undiscovered for so many years :)

If you can show, using the rules in the spec, that the sample program
shouldn't pass type checking, then you'll have proved the article wrong :-)

------
danking00
TLDR: We always knew null'a could float around and cause NPEs, but many
assumed they couldn't convince the compiler to unsafely convert, e.g., a
String to an Integer. However, since null can inhabit any type, we can use
null as a proof of any proposition, for example that there exists some super
type of Integer which is a subtype of String. Of course, if such a type exists
we should be able to traffic an Integer to a String.

The example on the page is pretty straightforward. I encourage everyone to
reason it out.

~~~
mjevans
I suppose that means that it is 'null' which is evil in Java. If you're going
to get rid of pointers, and you still want to have the option of a thing
existing or not, you therefore must represent it with a list and a contract on
the size of the list. (Zero entities, or one entity in the case of a 'single
object reference.)

Otherwise you'd only be able to detect 'missing' objects by exceptions being
thrown.

~~~
danking00
What you've described is precisely the Optional or Maybe type. Many folks
worry about the efficiency, which is reasonable, but feels misguided. A
nullable annotation would be sufficient for the safety we all crave and would
have no runtime overhead (i.e. It would still represent missingness with
null.)

~~~
lmm
The problem is that then your Optional is a special kind of type that only
works on reference types, and you either can't instantiate Optional<int> or it
has radically different performance characteristics from most Optionals.

------
susan_hall
There have been some fantastic conversations on Hacker News about type
systems. I've learned a lot about the different arguments for dynamic typing,
versus strict typing, versus gradual typing. Java's type system may or may not
have represented new thinking when it was under development in the early
1990s, but it certainly seems a bit dated in the year 2016. The limits of the
Java type system have been discussed both here on HackerNews and also
elsewhere, many times.

For those who want to consider arguments against Java's style of strict
typing, 2 things I would recommend include "Agility & Robustness: Clojure
spec, by Stuart Halloway":

[https://www.youtube.com/watch?v=VNTQ-
M_uSo8](https://www.youtube.com/watch?v=VNTQ-M_uSo8)

He offers a chart that shows the strengths and weaknesses of strict typing
versus unit tests versus the run-time checks offered by Spec. It's worth a
look.

The discussions around gradual typing have been interesting, but to see how
far the limits of this can be pushed, I would suggest everyone check out the
Qi/Shen programming language:

[https://en.wikipedia.org/wiki/Qi_(programming_language)](https://en.wikipedia.org/wiki/Qi_\(programming_language\))

"Qi makes use of the logical notation of sequent calculus to define types.
This type notation, under Qi’s interpretation, is actually a Turing complete
language in its own right. This notation allows Qi to assign extensible type
systems to Common Lisp libraries and is thought of as an extremely powerful
feature of the language."

This next quote is from someone who has spent a long time experimenting with
different Lisps:

"Qi (and its successor Shen) really push the limits of what we might call a
Fluchtpunkt Lisp. I suspect it requires a categorization of its own. A few
years ago I was looking for a Lisp to dive into and my searching uncovered two
extremely interesting options: Clojure and Qi. I eventually went with Clojure,
but in the intervening time I’ve managed to spend quality time with Qi and I
love what I’ve seen so far. Qi’s confluence of features, including an optional
type system (actually, its type system might be more accurately classified as
“skinnable”), pattern matching, and an embedded logic engine based on Prolog,
make it a very compelling choice indeed."

[http://blog.fogus.me/2011/05/03/the-german-school-of-
lisp-2/](http://blog.fogus.me/2011/05/03/the-german-school-of-lisp-2/)

Mark Taver, who created Shen, posted a comment and then turned it into an
essay here:

"The underlined sentence is a compact summary of the reluctance that
programmers often feel in migrating to statically typed languages – that they
are losing something, a degree of freedom that the writer identifies as
hampering creativity. Is this true? I will argue, to a degree – yes. A type
checker for a functional language is in essence, an inference engine; that is
to say, it is the machine embodiment of some formal system of proof. What we
know, and have known since Godel's incompleteness proof [9] [11], is that the
human ability to recognise truth transcends our ability to capture it
formally. In computing terms our ability to recognise something as correct
predates and can transcend our attempt to formalise the logic of our program.
Type checkers are not smarter than human programmers, they are simply faster
and more reliable, and our willingness to be subjugated to them arises from a
motivation to ensure our programs work. That said, not all type checkers are
equal. The more rudimentary and limited our formal system, the more we may
have to compromise on our natural coding impulses. A powerful type system and
inference engine can mitigate the constraints placed on what Racketnoob terms
our creativity. At the same time a sophisticated system makes more demands of
the programmer in terms of understanding. ...That said, not all type checkers
are equal. The more rudimentary and limited our formal system, the more we may
have to compromise on our natural coding impulses. A powerful type system and
inference engine can mitigate the constraints placed on what Racketnoob terms
our creativity. At the same time a sophisticated system makes more demands of
the programmer in terms of understanding. The invitation of adding types was
thus taken up by myself, and the journey to making this program type secure in
Shen emphasises the conclusion in this paragraph"

[http://www.shenlanguage.org/library/shenpaper.pdf](http://www.shenlanguage.org/library/shenpaper.pdf)

I'm only quoting two favorites of mine, but of course I could post a hundred
examples, all making a similar point. Java's style of strict typing is both
weak and incomplete, and yet overly rigid at the same time. It's worth noting
how many other strategies exist, that deliver more robustness, with greater
flexibility.

~~~
mncharity
Both the Qi and Shen pages on Wikipedia were deleted last year.

[https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...](https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Qi_\(programming_language\)_\(3rd_nomination\))
[https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...](https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Shen_\(programming_language\))

Deletionpedia doesn't currently have them. The Internet Archive has snapshots:

[http://web.archive.org/web/20141211005007/http://en.wikipedi...](http://web.archive.org/web/20141211005007/http://en.wikipedia.org/wiki/Qi_\(programming_language\))
[http://web.archive.org/web/20150102045719/http://en.wikipedi...](http://web.archive.org/web/20150102045719/http://en.wikipedia.org/wiki/Shen_\(programming_language\))

This is a long-standing problem (decade plus). Wikipedia deletionism interacts
badly with research programming language pages. They tend to get deleted for
"lack of notability", failing the test of "someone other than the people
involved, wrote about it on a sufficiently high-profile piece of dead tree".
And the pages face a recurrent threat of "heads you get deleted, tails we flip
again in a few years". Some pages have been through deleted/recreated/deleted-
again cycles. Some language communities manage to scrape together notability,
others browbeat the deletion nominator, but it seems most pages get deleted.
Fixing Wikipedia appears intractable.

Absent a wiki associated with something like LtU, creating an alternate wiki
has been beyond the capabilities of the programming language research
community. Which ends up reflected in balkanization - for example, people
working on category theoretic type hierarchies in different languages, being
unaware of each others' work. Shoemaker's children.

~~~
pjmlp
The day we loose the Internet it will many times worse than Alexandria's
library fire.

------
Groxx
In case it makes reading easier for people, here's the code with the "A, B, T,
U" replaced with what they would expand to. The "that can't work" comes out
pretty clear with the equivalent of "Integer-superclass which extends String"
which is obvious nonsense.

    
    
        class Unsound {
          static class Constrain<AString, BSuperInteger extends AString> {}
          static class Bind<AString> {
            <BSuperInteger extends AString>
            AString upcast(Constrain<AString, BSuperInteger> constrain, BSuperInteger b) {
              return b;
            }
          }
          static <TInteger, UString> UString coerce(TInteger t) {
            Constrain<UString, ? super TInteger> constrain = null;
            Bind<UString> bind = new Bind<UString>();
            return bind.upcast(constrain, t);
          }
          public static void main(String[] args) {
            String zero = Unsound.<Integer, String>coerce(0);
          }
        }
    

I left the T, U, A, B as part of the name, so it's easier to map back to the
original.

I wish people would stop insisting on single-character type variables. They
have a purpose - name them meaningfully, like everything else.

------
gravypod
I ran the example and I got the same result. The result that is provided, and
the one I got, makes sense. Can someone explain to me what is "broken" about
this?

Broken down you:

Create 3 classes \- Unsound: Unsound contains a static generic method. Two
types are requested (T and U). T is the parameter and the U is the return
type. From now on T will be called CoerceInputType and U will be called
CoerceOutputType \- Constrain: A class that has two generic types it requests.
A being the first type, and B who is the second type. A restriction is placed
on B saying that B extends A. We will call A ConstraintType and B
ExtendedConstraintType \- Bind: A class that takes one generic parameter of
type A. Here we will call A BindType

What is happening here?

Your execution stack:

    
    
       - *Call* Unsound.coerce. Tell it CoerceInputType is an Integer and CoerceOutputType is a String. You pass into coerce a value of '0' with the type CoerceInputType (an Integer class) and accept a result with the return type CoerceOutputType (a String class) 
       - *Assign* In Unsound.coerce you define a type Constraint<CoerceOutputType, ANYTHING extends CoerceInputType>. This means that these type annotations will be propagated through anything this is passed into. 
       - *Assign* you create an instance of bind with the type CoerceOutputType.
    

What this currently looks like

    
    
       <Output, AnyType that extends Input> Output ourFunction(Input i) {
          return i
       } 
    

What happens: You get an exception as Integer is not castable to a String. You
can look further down to the only link in this page to see where that cast is
coming from in the bytecode.

The authors seem to be surprised that this will not execute? Or maybe that
this doesnt? This is completely expected behavior. Maybe not to someone who
hasen't worked in Java and is coming from another VM but for me this is not
surprising. Why? Here's a list:

The ways you can trigger a type change:

    
    
       1. Auto boxing
       2. Casting
    

For integers and strings, these two type conversions will not work. Here is a
much more direct explanation of what's happening here:

    
    
       class Test {
          public static void main(String[] args) {
             String a = "Hello World";
             Integer b = 10;
             a = b;
          }
       }
    

Running this you will get the following error:

    
    
       test.java:5: error: incompatible types: Integer cannot be converted to String
       		a = b;
       		    ^
       1 error
    

This is because Integer cannot auto (un)box to String. This exception happens
at compile time. I'll get into why in a bit.

You might also ask, "But gravypod, what about an explicit cast, will that
work?" No, it will not!

    
    
       test.java:5: error: incompatible types: Integer cannot be converted to String
       		a =(String) b;
       		            ^
       1 error
    

This will not work at compile time as well.

Now for the next section: why will this not be picked up at compile time? Well
that's a hard one to answer, the basic answer is the method isn't existent
yet. Or to be closer to the truth, the method's types don't exist yet. When
you use generics you're creating a method that will work on a base set of
types. There isn't necessarily hard type checking on generics though. These
are mainly there for the programmer, not the runtime. The runtime will operate
on any data and is just expecting casting to change between types.

When run though a decompiler this is the output of Unsound.class:
[http://hastebin.com/sitamomogu.java](http://hastebin.com/sitamomogu.java)

All that has happened is the compiler couldn't follow the type annotations at
compile time and it was left for runtime. This should be improved, but I don't
see what is broken. The time of the error just shifted. This is the same thing
that happens with null. The error is not compile time when you are going to
have an NPE. Instead they are runtime.

Here is example code that compiles perfectly fine:

    
    
       public class Test {
       	public static void main(String[] args) {
       		String a = null;
       		System.out.println(a.getClass().getSimpleName());
       	}
       }
    

When run you will get this exception:

    
    
       Exception in thread "main" java.lang.NullPointerException
       	at Test.main(Test.java:7)
    

I don't think that this is broken behavior as this is consistent with the way
the JVM is meant to operate. Types are suggestions to the VM. This to me is a
side effect of auto (un)boxing.

Maybe I'm wrong. I hope someone could tell me why if so.

Edit: Please read
[https://news.ycombinator.com/item?id=13051036](https://news.ycombinator.com/item?id=13051036)
as this explanation is much more elegant and gets the idea across in a simpler
way.

~~~
Groxx
The unsoundness is more straightforward than that:
[https://news.ycombinator.com/item?id=13051659](https://news.ycombinator.com/item?id=13051659)

By passing `null`, the compiler doesn't notice that the type constraints are
irrational, leading to a violation - to satisfy them, it'd need an Integer
superclass which is a subclass of String.

Arguably that's fine, since the null can't be used to do anything - it's
perfectly "safe" to insert a `String x = null;` into a `new
ArrayList<Integer>().add(x)` for instance, since it's just a null. But writing
that code gives you a compiler error since it isn't allowed by the type system
- writing the same thing via generics should too, but instead it fails at
runtime.

------
rosstex
The code doesn't seem to change on click.

EDIT: Don't downvote, is anyone else having this problem?

~~~
theemathas
Weird. Doesn't work for me either. ¯\\_(ツ)_/¯

You can still get some of the code from the linked paper, though.

~~~
namin
What browsers are you using? Can you file an issue?

Thanks.

~~~
SomeCallMeTim
Tried in latest Edge, Firefox (50) _and_ Chrome (54) on Windows.

Clicking Scala gives me Scala. Clicking Java again gives me Java. But none of
the variants underneath make any difference to the code.

~~~
nimchimpsky
[https://github.com/namin/unsound](https://github.com/namin/unsound)

