
Java's Original Sin (2009) - tosh
https://gbracha.blogspot.com/2009/05/original-sin.html
======
JanecekPetr
There is the project Valhalla now,
[https://wiki.openjdk.java.net/display/valhalla/Main](https://wiki.openjdk.java.net/display/valhalla/Main),
aiming to bring value types and eventually generic specialization and
reification.

The work on that project has been impressive, and there is now an experimental
EA build called LW2 where you can play with value types:
[https://wiki.openjdk.java.net/display/valhalla/LW2](https://wiki.openjdk.java.net/display/valhalla/LW2)

Will this solve everything? No. The char type is hosed. But there will be an
opportunity to e.g. get a new Character.

~~~
MrBuddyCasino
To anyone looking: it seems as if the project is dead since 2015, but the
prototype is in active development. Thanks for the link!

~~~
JanecekPetr
Thank you, I updated the link in my post to point to the up-to-date and
frequently updated wikui page for Valhalla information.

------
nitwit005
> Now, if characters were objects, their representation would be encapsulated,
> and nobody would very much affected how many bits are needed.

Making characters objects doesn't seem like it magics away this issue. You'd
need APIs to get and set raw values somewhere, and those APIs would have
become the problem.

~~~
lmm
> Making characters objects doesn't seem like it magics away this issue. You'd
> need APIs to get and set raw values somewhere, and those APIs would have
> become the problem.

You never need to "get and set raw values". You need APIs to do the things
that are sensible to do with characters, such as "test whether it has this
property" or "encode as UTF-16 byte sequence", but all those things have much
clearer semantics than "access raw value".

~~~
nullwasamistake
You need raw values for serialization over the wire and to files. And for any
kind of compression. And for searching (it's hard to build indexes in Unicode
because of uneven symbol widths).

You could convert to a different character format but at some point you need
raw values. If you want to be able to read text created by something else
(text editor, browser, different OS) you have to expose raw access.

~~~
lmm
> You need raw values for serialization over the wire and to files. And for
> any kind of compression.

You need to encode as bytes with a particular encoding. That's not the same
thing as "getting the raw value".

> And for searching (it's hard to build indexes in Unicode because of uneven
> symbol widths).

It may be hard but it's important to do it right. If searching for "café"
doesn't find "caf◌́e" then you've got a problem.

> You could convert to a different character format but at some point you need
> raw values. If you want to be able to read text created by something else
> (text editor, browser, different OS) you have to expose raw access.

Again, no. You need to be able to decode byte sequences that have particular
encodings into strings of characters. That doesn't mean you need to expose the
internal representation of those strings/characters.

~~~
nullwasamistake
> Again, no. You need to be able to decode byte sequences that have particular
> encodings into strings of characters.

You keep reiterating this, but it's not feasible. To avoid exposing the raw
values, the language would need to support all possible encodings.

~~~
lmm
Nonsense. What is it that you can do with a "raw value" that you can't do with
the representation in a particular encoding? I mean, if you wanted to import a
character that doesn't have a unicode codepoint then you couldn't decode that
character from UTF-8 - but if the language is built with no support for non-
unicode characters then even if you did have access to the internal
representation of a character, that wouldn't help you (e.g. the language's
built-in character functions for things like checking the case of the
character won't handle a non-unicode character properly).

~~~
nullwasamistake
In Java at least, you can write your own Charset implementation then the
language will support it normally. This uses the byte raw access I'm ranting
about to work

~~~
lmm
Yes and no - a Java Charset is something that can convert from a buffer of
bytes to a buffer of utf16 code units. In Java that happens to be the internal
representation of a String, but it doesn't have to be - you just need built in
support for encoding/decoding a string as utf16.

A simple proof of this is that you can write custom character sets in Python
too, even though there's no way to have raw byte access to a Python character
(because it's different on different platforms/builds).

------
dang
Discussed at the time:
[https://news.ycombinator.com/item?id=636307](https://news.ycombinator.com/item?id=636307)

~~~
latchkey
This would be a nice feature for HN. Any links which have been previously
discussed show up as a link with the total number of comments/votes also shown
next to the link.

Edit: I noticed the 'past' link, but that takes you to the search engine. I'd
prefer something more direct like what @dang just posted.

~~~
hadrien01
I use the "Refined Hacker News" extension, and it adds links to previous
discussions at the bottom of the page

~~~
latchkey
Very nice, I currently use a hacked up version of HNES, but it isn't updated
anymore so I'll give this one a try.

------
bni
I read a long time ago that Java had primitive data types like char to be
compatible with migrating C code to Java.

Remember this decision was made in early 90's and getting users onto the
language was imperative. Target audience was most likely using C at the time.

------
willvarfar
when I read an article like this and come to different conclusions, it makes
me question if I understood it correctly.

I have been bitten more than most by the 16-bit char. I completely agree with
the modern popular opinion that strings should be made from grapheme clusters,
or failing that, codepoints.

But I get confused when the author uses the example of int, and then we run
into problems. One confusion I have is how the author says that a method
naming scheme is not operator overloading, when actually that's how it is
always done in the languages I can name. Another confusion I have is that if
primitive int and Integer merge into Int, then we have problems with null?

I've been a bit involved with the Mill CPU effort, and lets not go off on a
tangent about that, but one thing in there that is a thing of beauty is the
Not-A-Result/None approach. Values have special meta-data to say if they are
an error or not. So, say, dividing by zero returns a value, but that value is
tagged as an error, rather than an exception being thrown. If you use this
value in subsequent computation, then the error propagates. Only if you then
try and do something externally visible will the error throw. 'None' is a
special type of error that doesn't throw when it becomes invisible, but
instead becomes a no-store. This model is excellent for the speculation inside
CPUs, but I posit that it would also be a really awesome model for higher
level languages too.

Imagine modelling null as 'None'; a function returns a number, except it
returns None instead. This None is then used in subsequent math, and the
Noneness propogates. When you try and do something with it, it just silently
does nothing. Another time a function returns an overflow error, or something
like that, and when you eventually try and write that to a member variable
where it becomes visible to other code, there's an error thrown. I know first
hand that when you internalize this, lots of error handling and checking
evaporates and its a nice zen.

Personally, I wish I had a faster and statically typed Python. I love Python's
infinite integers and its string handling (which I had no problem with in
python2 either, so ymmv). My biggest gripe with Python in production is that
it isn't statically typed.

~~~
chrisseaton
> Another confusion I have is that if primitive int and Integer merge into
> Int, then we have problems with null?

The author isn’t suggesting that this would be a design change that would be
compatible with the current semantics of Java, so behaviour for null would
change.

~~~
willvarfar
Yeap.

I can think of two main uses for Integer: when you have type polymorphism
(e.g. my own code often has maps of Object) or nullness.

I go on to describe a different approach to nullness that suites ints, for
example.

------
Causality1
Frankly I think Java's original sin was being sold to Oracle, and like the sin
mentioned in the original post we're all still paying for it.

~~~
mr_crankypants
It's certainly led to plenty of problems, but, having happened closer to now
than to the birth of Java, it's not really a candidate for "original".

------
tomohawk
You'd also have to disallow null as a value for any of these boxed types.

It's disconcerting and astonishing to check for true / false and get a NPE, to
say the least.

One nice thing about Go is that strings are values and cannot be null.

------
nerdponx
Isn't this basically the Python data model? Especially so in Python 3, where
strings are strings of unicode codepoints rather than bytes ("runes" like Go
would be even better but codepoints are halfway there).

Edit: In Python, equality, addition, even attribute setting can all be
overridden by implementing certain "magic" methods on a class. Much of Python
syntax is effectively syntactic sugar for calling these methods.

~~~
grahamedgecombe
> "runes" like Go would be even better but codepoints are halfway there

Go runes are codepoints.

I think Swift is interesting, a "character" in Swift is actually a grapheme
cluster.

~~~
shellac
Swift probably inherits that from NeXT/OpenSTEP/Cocoa which was very unusual
at the time in its degree of unicode support. For example it used something
like unicode equivalence in string comparisons.

(You'd see objective-c benchmarks suffer because of this)

------
googlemike
Java is a wildly successful language, and I find the title to be rather
clickbait-y in that it attempts to be polarizing for no reason. Original sin
implies a failure, and the author (of the blog) discusses a nuanced design
choice of the language. It is one the most successful languages I know in
terms of adoption and scale (alongside Javascript, C++, and Python).

By what measure, if not this one, can we truly compare languages?

~~~
adamnemecek
The author was involved in the creation of Java. I don't think he's shitting
on java per se.

~~~
shellac
He also guided the evolution of java from 1997 until 2006, so from 1.1 to 1.6,
a very significant period in java's development.

He's also one of the team behind Dart. Hrm.

------
twhitmore
Clickbait in some respects. "Java should have been a different language from
what it was, more pure in OO terms, and without the characteristics that made
it understandable & popular for developers from C and similar backgrounds."

While the ideas are (somewhat) valid, it would have been a quite different
language from the language that gathered traction and became popular as Java.

~~~
twblalock
> While the ideas are (somewhat) valid, it would have been a quite different
> language from the language that gathered traction and became popular as
> Java.

Yep. Practicality beats purity when coding in industry, and the Java ecosystem
has almost always chosen practical solutions over ideal ones.

This is the intentional result of prioritizing practicality and usefulness and
portability.

People who write articles complaining about Java seem to think Java should
have different priorities, but Java has been one of the most popular languages
in industry for a long time because its priorities make sense.

~~~
jaredklewis
What part of Java having a more consistent API with identical performance
would have not been practical?

------
lolive
To me, the sin of Java has been to make String final. Or, to be more precise,
not to have made it generifiable. (So the compiler knew the difference
between, let's say, String<FirstName> and String<LastName>)

~~~
kjeetgill
I'm not sure I understand your example, do you want a string of FirstNames?
Did you mean something like:

class FirstName extends String {...}

~~~
lolive
Something like semantic-aware String, yes. (with syntactic sugar to avoid me
repetitive tasks, plus the ability to uplift an existing code using String to
its typesafe version in the most straightforwatrd manner). Something like the
generics system but for String.

I of course can uplift my String into Objects. But it is unneeded for 90% of
the String usages. Plus it forces you to unbox the String from its Object to
pass it to a function that consumes just a String.

------
saagarjha
> Besides, I’ve come to the conclusion that Java made the right call on array
> covariance in the first place.

I'm actually curious to hear why the author thinks this, because I've largely
come to the opposite conclusion…

~~~
tosh
Dart's FAQ has an entry "Why are generics covariant?". Gilad Bracha was
involved in Dart's early language design decisions.

[https://dart.dev/faq#q-why-are-generics-
covariant](https://dart.dev/faq#q-why-are-generics-covariant)

------
eternalban
Interestingly enough, this motivates overcoming my long standing aversion to
Scala and giving it a try finally.

~~~
mr_crankypants
If you're just looking for a cleaner type system, you might give Kotlin a look
first. It gets you that, while remaining a much smaller and simpler language
than Scala.

~~~
kaidax
Kotlin is a much larger than Scala on syntax, has 100+ keywords and built-ins,
compiles slower, all while having a way weaker OO system and and doing a small
fraction of what Scala does.

~~~
mr_crankypants
I haven't made a formal study, but I think that it still ends up still being a
smaller language. Scala doesn't have a lot of keywords, but it does have a
_lot_ of overloading of what those keywords do. It's something I have to spend
quite a bit of time explaining when I'm bringing people up to speed on Scala.

The OO system is absolutely weaker. Depending on your situation, that can be a
big advantage. I work in a mixed Java/Scala codebase, and, while I generally
like Scala, one of our more annoying bits of yak-shaving is making sure
outward-facing Scala code doesn't do anything to make itself overly awkward to
consume from Java. I haven't worked as extensively in Kotlin, but, from what
I've seen, you've got to spend a lot less time worrying about it there,
because Kotlin stays much closer to Java's OO semantics.

My sense is, if type erasure or lack of parametric polymorphism are really
causing you pain, yes, Scala is for you. Pattern matching, too. If your Java
pain points are more prosaic than that, Kotlin is a less radical change that
is likely to address most of them them without introducing too many new ones.

------
smartstakestime
well how good did that work out for ruby???

But this actually would have been better in Java because the compiler could
have delt with the inefficiency cause by the unnecessary overhead.

At implementation make it an object but at runtime it will be a scalar.

