
JRuby Creator's Short List of Key Missing JVM Features - gthank
http://blog.headius.com/2010/06/my-short-list-of-key-missing-jvm.html
======
strlen
First, I want to say that irrespective of its flaws, the JVM is an awesome
piece of software. There's lot of really exciting work still going on: G1
collector, Azul's open source contributions, register based Dalvik VM. Ten
years ago few people would have imagined distributed databases running on the
JVM that are fast enough to saturate the network, industrial scale JVM based
HPC and running 20gb sized heaps under heavy production load without any GC-
related problems in sight. On another note, few people would have imagined
Scala and Clojure either. So don't take what I am about to say as a "yet
another rant from a developer bashing his tools" rant.

There's great deal of optimization going into the JVM as a server-side VM, but
unfortunately, there's less work going into the JVM as a client VM. Lot of the
advanced options that exist on the server version of HotSpot aren't available
on the client version. Azul's recently open sourced tweaks are also aimed at
the server side market.

I fully understand the reason for this: Microsoft dominates the "thick client"
market, the money for Java is on the server side (with Javascript client code
running in the browser). The few Java desktop apps that I use are development
related: Yourkit profiler, JVisualVM, IntelliJ, Processing-based Arduino IDE
and Eclipse CDT (for the excellent AVR plugin). They're actually excellent,
but (with the exception of light-weight Processing/Arduino IDEs) they're aimed
at developers who are familiar with the Java platform and don't mind
occasional rough edges: I had to tune IntelliJ's GC settings (enabling
CompressedOops on 64-bit Linux, using a 32-bit JVM and CMS collector on OS X,
adjusting sizes of various generations etc...) to get it to stop locking up on
me. Given I work on memory intensive Java and Scala applications, that's not a
problem for me. Imagine, however, a word processor application that required
this ("hey mom, it really is called `CompressedOops', one word, capital C and
capital O!").

On the other hand Qt provides many utilities for C++ beyond the UI and has
bindings for Ruby. It's cross platform between OS X, Linux and Windows. Qt and
C++ would be the route I'd take were I to build a cross platform application I
am skilled enough with valgrind and honestly don't find lack of a GC to be a
major problem for me (especially for desktop programming, which doesn't
frequently involve complex parallel algorithms that are difficult to implement
with manual memory management). Qt's provides a great deal of what one expects
from a high-level platform like the JVM: concurrency libraries beyond
primitives, simplifications for building event/callback driven applications,
additional collections and even an IoC container.

If I didn't mind tying myself to Microsoft's standards, I'd also take a
serious look at Mono: while Mono's GC is primitive compared to HotSpot when it
comes to server side applications, I've yet to hear of people having to tune
it in order to run Mono-based desktop apps such as (excellent) F-Spot; garbage
collection is a long-ago solved problem, I find it hard to believe that client
VMs couldn't come with sane defaults out of the box. In this way it reminds me
of Linux in mid-90s/early 00s (before distributions like Ubuntu and extensive
driver support): lot of work going into scaling to thousands of processors and
new architectures, while getting support for desktop sound and video cards
required recompiling a kernel.

Finally, the author really strikes a point with lack of a POSIX API built in
to the JVM. What's even more striking is that this can't even be explained by
"focus on enterprise server market" theme. Server-side, Linux is almost a
mono-culture with occasional use of Solaris or OS X (the latter frequently as
development environment). The few non-POSIX platforms (Windows, mainframes)
aren't really a large part of the Java market and have POSIX compatibility
layers available for them (bonus point: why not support both, like Perl does
-- having builtins for most system calls, but also an excellent Win32 module).

POSIX-JNA is available and (from what I hear) is an excellent library, but its
use of GPL makes it incompatible with Apache-licensed projects (ASL 2.0 being
de-facto standard in the Java world). A minimal interface to POSIX, coupled
with a "systemcall()" method (allowing easy use of Linux-specific extensions)
_should_ be the standard part of the JDK: Python and Perl offer this without
sacrificing portability and safety, why can't Java?

~~~
nl
I'm not sure this is what you want to hear, but you might as well give up on
Java on the client. No one cares anymore.

The only big company shipping client-side Java software applications is IBM,
with Eclipse and their other Eclipse-based products.

Oracle quite plainly doesn't care, and without significant investment it's
just not worth the pain anymore.

I'm a Java developer, and I've recently been looking into developing a cross
platform client app. Java just isn't a serious player - the reasons why I'm
looking a client app (basically hardware access) are exactly the areas Java is
spectacularly weak in.

(I'm excluding Android from this, because that is a special case)

~~~
Nelson69
Why does it matter if "anyone cares?"

Nobody (generally speaking on the grand scale) is using LISP for much of
anything. Same is true with Ruby, you'll be hard pressed to find a "serious"
or "major" company actually shipping Ruby. You look in those terms and you
have 2 options: C++ on Windows and Objective-C on Apple, those are the only
interesting client app platforms that the "major players" are doing anything
with. Maybe you throw .NET in to the mix but that looks a lot more like
Windows-only 'java' for enterprise server apps than a client platform.

I could see the concern about major player adoption being an issue of Java
going away; that doesn't seem likely anywhere except maybe on Apple, even
there I doubt it will happen until Oracle or another third party "takes over"
the Java on OSX platform, I couldn't see it becoming completely unavailable.
Now Java isn't great at doing the fancy platform integration, the Mac menu and
windows tray support seem a bit fragile and the interest has been low enough
that it doesn't have a ribbon menu and various modern animated UI components
can be a lot of work in Java. Depending upon what you need the client app to
do though it still seems like a totally viable platform. If it does what you
need, what does any of the other stuff matter?

Juniper's VPN client is in java.

~~~
nl
_Why does it matter if "anyone cares?"_

Because the future of the platform is important if you are betting an
application on it.

 _Now Java isn't great at doing the fancy platform integration, the Mac menu
and windows tray support seem a bit fragile and the interest has been low
enough that it doesn't have a ribbon menu and various modern animated UI
components can be a lot of work in Java. Depending upon what you need the
client app to do though it still seems like a totally viable platform. If it
does what you need, what does any of the other stuff matter?_

It doesn't do what I need, and it's direction as a platform indicates that
there is no point working on the things I need myself because then I'd have to
support them for ever.

For most of the major scripting languages (Ruby, Python etc) there are pretty
decent libraries for platform integration, and they are being actively used by
multiple developers. If I need to patch something then there is every chance
that patch will be accepted back into a broadly used library.

With Java, the ecosystem of client side developers isn't very large, and there
aren't any big companies putting resources into it either. It's just stagnant,
and that isn't good enough.

------
donw
I've been wondering for awhile if the JVM really has a lot of runway left;
Java 7 has been amazingly slow in coming, and it'll be at least six months,
and more likely a year, before even the early adopters can really use it in
production.

Startup time is a very real problem, as is memory use compared to other,
similar VMs (V8, LLVM).

With closures (lambda expressions), Java will be a lot more useful, but my
money is honestly on V8 and JavaScript at this point. It's moving faster, and
the node.js guys are very right that JavaScript is the language of the web.

~~~
noelwelsh
The JVM does a huge amount more than V8/node.js. If your world consists solely
of shuffling bytes around a network then node.js may be a good solution, but
it doesn't extend much further than that. For example, I wouldn't want to
write a machine learning algorithm in Javascript, nor would I want to write a
storage engine, nor a ... you get the idea. Furthermore, while node.js may be
growing quickly (easy when you're small!) the development of the Javascript
language is taking a rather torturous route as the various vendors play games
at the Ecmascript table.

To return to the article, what I really want from both the JVM and Javascript
is tail recursion. I'd also like proper lexical scoping in Javascript, though
that isn't so important. I don't view Java or Javascript as languages you
write but rather languages you compile to (well, you compile to bytecode for
the JVM, but hopefully you get my point). Scala and Clojure make mighty fine
Java replacements; I haven't seen anything yet for Javascript that is much of
an improvement.

~~~
sanderjd
Could you list some reasons why you wouldn't want to write a machine learning
algorithm in Javascript, or a write storage engine nor a ...? Lack of
libraries? Too high level? Or do you not like the syntax/semantics/...?

~~~
noelwelsh
Performance -- Javascript VMs are not optimised for numeric code.

Lack of libraries.

I don't particularly like JS for involved projects (e.g. no module system, so
you have to write your own).

------
stcredzero
It seems like "tracing JIT" is really just "advanced compiler hints from
runtime analysis". There's no reason this couldn't be gathered by tools and
saved in a file to be used by compilers. Combine this with very advanced
symbolic debugging and very rapid compilation, there would no longer be any
need for late-binding dynamic languages. (Says the lover of Smalltalk, no
less! It would have to be with debugging on the level of Smalltalk or IPython
and then some.)

~~~
headius
And true to form, on the JRuby project we're looking into ways to optimize
around where the JVM doesn't quite serve us well. I've recently been
experimenting with doing my own dynamic optimization passes, and that let me
to think about how to save off this dynopt information to disk for instant
gratification on future runs.

I think part of the problem with the JVM is that its requirements have too
long been driven by the big EE server folks, who have almost completely
different requirements from day-to-day developers, client app developers, RIA
developers, and so on. The key here might be making a good business case for
those other domains, to help drive the JVM in the direction those domains
need. I actually have hope that the (perhaps misguided) push for JavaFX at
Sun/Oracle will bear fruit in the form of client-side and non-giant-server-app
domains, since I know the JavaFX team have butted their heads up against a lot
of the same problems we've faced in JRuby.

~~~
stcredzero
I was thinking about applying this to Go.

------
10ren
When he says "object serialization" is broken, does he mean Java's built-in
object serialization (JOS), or the concept of serialization, independent of
any specific implementation? He later says "default serialization", which is
what JOS's serialization of an object is called if its serialization hasn't
been customized - but maybe he means JOS itself?

Anyway: for JOS, you don't need to provide no-arg constructors and you don't
need to un-final fields, because JOS extralinguistically both creates without
constructors and sets final fields. JOS also provides hooks for you to
initialize objects. It has many other other hooks that few people use.

It's true that some aspects of JOS are ugly in how it implements what it does,
but much of what it sets out to do is necessary for a full-featured
serialization, that can work over the network. Serialization in other
languages hasn't shown a "right" way to do it that I'm aware of.

I'm interested to hear more about the problems Charles found with JOS, and
whether he has misunderstandings about this arcane bit of java, or if it's me
who's misunderstood his very brief aside on it here.

~~~
headius
I mean the built-in serialization. Outside of the classloading and security
hacks required to make it work, the fact is that it performs at its worst if
you _don't_ do things like provide a no-arg constructor and non-final fields.
In those cases, the amount of reflective hackery required behind the scenes is
absolutely dreadful, and in benchmarking a simple graph of objects recently I
saw that 99% of the time was spent doing reflective access. That's absurd.

Rewriting to use Externalizable was a painful process, but it was orders of
magnitude faster than builtin serialization. The default serialization
mechanism is basically unusably slow for any high-throughput purposes.

I have not seen the hooks you describe for user-driven initialization of
classes, and unfortunately most of the resources I consulted online while
trying to write fast deserialization logic recently didn't mention them
either. Got a link? I'm certainly willing to learn what I'm doing wrong.

~~~
10ren
True, reflection is slower than regular access - though JOS does a fair bit of
caching to avoid some of the cost (for when you serialize many instances of
the same class). I think your experience with rewriting as Externalization
shows that JOS is one of those performance vs. ease trade-offs.

BTW: Are you sure that no-arg constructors and un-finaling fields makes a
significant performance difference? The only instance I've come across for
this showed the (surprising) result that JOS's setting final fields is
actually faster than reflective setting of non-final fields - but I haven't
profiled that explicitly. I would expect deserialization with a no-arg to be
slower, because then it has to call the constructor in addition to actually
creating the object (allocating memory etc). But I haven't profiled this
either.

You can customize the initialization of a class by creating a _readObject()_
method for it, which is a sort of extra-linguistic constructor. It is
confusingly named the same as the method that you call to start
deserialization - but here it is a callback method that you write, that JOS
itself will call:

    
    
        private void readObject(ObjectInputStream ois)
                       throws IOException, ClassNotFoundException {
          ois.defaultReadObject();   // "default" deserialization
          // ... your initialization here ...
        }
    

You can also read data from the stream explicitly, and set the class fields
yourself - if so, you need to write a corresponding writeObject() method that
writes the data to the stream. By avoiding reflection, this should be faster
than the "default" (but again I haven't profiled this). If you don't provide a
_readObject_ , it will call _defaultReadObject()_ for you by, you know,
default. Same for _writeObject()_.

Googling for the specific methods turns up many tutorials. The following docs
are for serialization in Java 1.3, but these "basics" haven't changed: for
_readObject()_
[http://java.sun.com/j2se/1.3/docs/guide/serialization/spec/i...](http://java.sun.com/j2se/1.3/docs/guide/serialization/spec/input.doc4.html#2971)
(section 3.1 has the API, with _defaultReadObject()_ in it). Here's the
overall contents page:
[http://java.sun.com/j2se/1.3/docs/guide/serialization/spec/s...](http://java.sun.com/j2se/1.3/docs/guide/serialization/spec/serialTOC.doc.html)

There's also hooks for validating the object graph after _all_ of it has been
deserialized; and for replacing (resolving) one object with another.

I know something of JOS, but not much of its performance, which seems to be
your crucial requirement. So I might not be much help, but let me know if
you'd like more info or if I've misunderstood something (I might not reply til
tomorrow - it's 3am here).

~~~
headius
Ahh, I see where the confusion comes from. I meant initializing in the Ruby
way...which is analogous to construction. There's no way for you to specify
_how_ to construct the object being deserialized from the stream, and so you
need a no-arg version of the constructor (or can allow JOS to generate one for
you) and have to do the logic that would be in-constructor in a separate piece
of code like readObject. And in my latest experiments with Externalizable, the
JOS code being unable to construct objects the way I want it to has become the
latest bottleneck.

It also interferes with us serializing objects for which we _do_ want to have
final fields initialized on construction. If we want to avoid the reflective
construction, we need a no-arg constructor. To have a no-arg constructor, we
can't make important fields final. And if we want a particular value to be
passed into the constructor, we always need to do it in readObject, without
_any_ context provided as to where and when serialization is being called.
That's so cumbersome that we simply can't do it.

~~~
10ren
To check I understand: you need to deserialize objects, and you also want to
pass in some arguments to some of those objects, so that their initialization
is based on both the serialized data and the arguments? Is that to do with
currying (is currying in Ruby?) In a complex object graph, how do you know
which arguments should go to which objects?

You could use a global object to hold these arguments (or, subclass Thread,
and associate the data with that, and yield while the deserialization runs in
that thread). Ugly, yes.

BTW: it is possible to set final fields outside a constructor, using some
black hacking (by accessing the same hooks that JOS uses internally). A few
serialization tools use this, and I recall a project that consisted entirely
of providing nicer access to these hooks (I searched for it, but couldn't find
it).

