Hacker News new | past | comments | ask | show | jobs | submit login
One VM to Rule Them All [pdf] (uni-linz.ac.at)
161 points by charlestuy on Aug 18, 2013 | hide | past | favorite | 88 comments

This looks interesting, but I am generally worried about something being owned by Oracle. Simple example; I own an Openpandora. I wrote a live coding environment in Java with a very basic, lispy language at it's core. Using any free Java implementation it either; did not start (core dump; suspect; not enough memory to get even the VM up) or ran very (unusable) slow. With the Java 7 closed source ARM version from Oracle; runs fast (just as smooth as on my Macbook) and uses very little memory.

I have the same experience on x86, but because of the abundance of resources it's less noticable.

I don't insist on everything being open source (completely), but base languages/VMs should be. Unless something about this changes, I'm not using anything owned by Oracle again.

The Graal OpenJDK project is available open source with a GPL license (http://openjdk.java.net/projects/graal/). The Truffle API includes a Classpath exception allowing language implementors free choice of license (http://mail.openjdk.java.net/pipermail/graal-dev/2013-August...). We welcome anybody who wants to work with us on moving language implementations co-existing on a single platform forward.

The problem is that nobody trusts Oracle. Have a look at what happened to Google. Their code didn't even originate from Oracle and they still got sued for made-up, bullshit reasons.

It of course sucks for those who invested their time in hacking the Oracle stack, but I think many people these days outside the Oracle bubble see Oracle as a dead-end.

Most people don't have a war chest full of money to hire the best lawyers like Google to protect them against baseless lawsuits.

Anyway, even from a technical point of view, things like Graal and Truffle are terrible workarounds and I prefer just fixing the things which are broken in the first place. Seeing that this will never happen at Oracle, I'll just work on code were the maintainer/owners are not so openly hostile.

But you see, this is exactly the wrong kind of conversation. You can't say "nobody trusts Oracle" while Oracle is bigger and more profitable than Google, and while the majority of important software worldwide is built on Oracle technology. Oracle won't pay attention to this kind of statement because it's obviously false. The truth is that most companies do trust Oracle.

What we can and should say is that Oracle is losing the trust of the Silicon Valley startup crows, and that this is a very important population segment. It's like a candidate being elected but gradually losing the young vote – it's a bad sign that Oracle would do well to consider.

There's a lot that can be said about Oracle's stewardship of Java, what it's like and how it should improve; it's a very important conversation. But the tone of your comment (and some of its child comments) seem to take the discussion in a misinformed direction. I don't know how much of the world's important server-side software is written in Java (or runs on the JVM), but I wouldn't be surprised if it's around 80%. The fact of the matter is that aside from small-to-medium startups' servers, the world's server-side software is mostly Java. Almost all non-embedded, large defense software is Java; almost all banking software is Java; Facebook, Twitter, LinkedIn and Google are mostly Java/JVM shops (when it comes to their infrastructure software).

Saying "I'm not using anything owned by Oracle" is something most serious server-side developers just can't say, because, frankly, there's little choice. If you require high-performance, multi-million-LOC software developed by a large team, you would be quite foolish to choose anything but the JVM. So we should certainly discuss Oracle's behavior, but pretending that we can avoid Oracle software like it was a specific linux distribution is a misrepresentation of reality.

You are correct that the JVM is ubiquitous in server software but not nearly as much as you assume and there has been a shift away from it in recent years for reasons having nothing to do with Oracle etc. I've been designing high-performance server engines since the 1990s and have watched the architectures and tool chains evolve.

Five years ago most new server engine development was done on the JVM. Since then there has been a shift toward new development being done in C++ such that the vast majority of new server engines I know about are being developed in C++ (mostly C++11) across a diverse range of companies. The reasons are practical and reflect the evolution of hardware.

The short version is that on current hardware C++ is much more efficient, both in terms of throughput per core and can achieve integer factor improvements in absolute performance relative to the JVM. Most of the differences are boil down to two things. First, server performance tends to be bound by memory performance and the JVM is quite a bit worse than what is easily achievable in C++. Second, an optimal high-performance server engine design in C++ is difficult to express within the JVM so basic design of the engine kernel tends to be less efficient as well.

In large-scale systems, that starts to add up in terms of power and hardware consumption and companies are more sensitive to this than they used to be. C++ currently offers significantly better characteristics and using less hardware to do it.

I thought I knew most of what is hot and fresh on the software development front, but I've never heard of any "server engines." Mind explaining what it is? Virtualization containers perhaps?

A "server engine" (usually called a kernel actually) is like a userspace operating system kernel that is purpose-built for a class of server workloads and attempts to optimally manage system resources for that workload. The server application is built on top of that kernel. When people refer to e.g. "database kernels", this is what they are referring to. It reimplements the operating system resource management services through the OS APIs.

Writing an excellent server kernel requires a high level of technical ability and quite a bit of low-level code since you have to reimplement most of the operating system resource management services most developers take for granted. Few pieces of open source server software are built on userspace kernels and the ones that do like PostgreSQL are (currently) only partial kernels that still rely on the OS to do significant things a full kernel would reimplement. A properly designed database kernel, for example, is at least 100k LoC of low-level C/C++ and that is before you actually write the server application that sits on top of it. I've designed and written kernels for both database engines and network engines, it is not trivial.

So why would you want to go through all this effort instead of writing to the standard POSIX APIs directly? Because performance and scalability. For example, a well-designed disk buffer and scheduler for a database kernel can easily triple the I/O throughput possible with a highly tuned POSIX implementation. A lot of locking and blocking, both explicit and implicit in the OS, become unnecessary. Certain kinds of distributed system problems become easier to solve because you can adaptively schedule data flows at a fine-grained level. One of the reasons Oracle and DB2 scale so well and get such high throughput is that they are built on highly optimized userspace kernels.

Virtualization actually degrades the performance of high-performance server systems in part because the hypervisor acts as a primitive operating system underneath the operating system the server kernel can see. This does not impact non-kernel based servers as much.

Thanks for the explanation!

so are you saying that running Oracle in a VM will kill its performance? By what factor would you expect?

I think the key point you are missing is developer productivity.

Every potential performance gain C++ might have is completely negated by having to touch that complete clusterfuck of a language.

In the really large server-side systems we're talking about here (banks, defense, etc.) both developer productivity and maximum speed rank way, way below other factors, such as having long-lived processes.

Java's not a clusterfuck? I'll take the clusterfuck that's not controlled by a creepy asshole who thinks the NSA is essential and says stupid shit like "Who's ever heard of government misusing information?"

You are talking about Java right?</s>

I agree with you, but as I can choose the software I create and with what I'm not very interested in working with Oracle stuff.

I worked with Java since 0.1 in the enterprise world for over 10 years and with Clojure & Scala I would continue with it, but I don't trust Oracle to do the right thing in releasing source at the same pace or at all for some critical perf/scaling features and I tend to think thats wrong.

How does OpenJDK handle serverside compared to the Oracle binaries these days?

I don't know how common it is in the industry, but it's a damn fine piece of software. It's also under Oracle's control, though.

Yes, but it's GPL right? So if it's a really fine piece of software, it can be forked (not arguing if that would be good or not).

Edit: and indeed it would be nice to know if a company like Google etc uses OpenJDK or the Oracle one.

I think Google already had a lot fun with Oracle while not using their code.

If you fork, be prepared for a patent-infringement lawsuit as soon as you earn some money with it.

I'm pretty sure most companies have realized that staying the hell away from Oracle isn't a stupid move.

Actually, most companies that aren't Google (finance, government, military, enterprise) realized that simply sticking with Oracle is even a better move. I don't know if you know this, but Oracle is making more money (profit) than Google (Oracle is also a much larger company than Google). They wouldn't be doing that if people were staying away from them. Lawsuit aside, even Google itself (and Facebook, and Twitter) runs most/much of its software on Oracle technology, so I guess they're doing something right. It's just that HN is skewed in favor of smaller-scale software. When you're in the major league, Java is often the only smart option.

there may be more reasons other than technical that large businesses run on Oracle (database).

The amount of paid "consultants" that oracle pushes thru to make their sale, plus their tactic of charging what you can afford to pay, as well as the "can't get fired for buying IBM" mentality of large IT organizations means that oracle has a distinct advantage over any small competitor.

Not that oracle's tech isn't good, but there are comparable alternatives, but they are only comparable in the technical aspect, not in the (strongarm) sales aspect.

I was mainly referring to Java, not to Oracle DB. There aren't really good alternatives to Java.

.NET is a fine alternative in 80% of the use cases. Honestly, in half of the projects I see coming through, the only reason java is a requirement is because everything else is already in java; the only hang up is DB2/MSSQL/Oracle drivers.

.NET is never a good option to Java. The CLR suffers the same performance issues as the JVM and as noted in the thread above people are starting to migrate back to C++ as a result and yet .NET has the further restriction that it only runs on Microsoft systems. So with .NET I get the disadvantages of Java combined with the restriction of not being to migrate to other platforms. That's not a good deal for the enterprise.

Anybody can sue for anything. Google won that lawsuit.

True, but I think we mean the likelihood being sued.

The likehood of being sued approaches 1 as soon as you're successful in building something that makes money.

Yeah, but couldn't that happen with other open source products as well? Serious question, not sure if that happened or not.

> you would be quite foolish to choose anything but the JVM.

Do you have technical reasons for saying that, or only political ones?

Meaning the world is in corporate hands,sad enough.

The world is in corporate hands, but Oracle and the JVM are far from being the worst of it. At least all Oracle wants is your money, unlike a younger kind of mega-corporations that will give you services free of charge only to hold your most secret information – e-mail, photos, current location, documents and more. Oracle can be infuriating, but it's not scary because it can be easily understood. What companies like Google want, well that's far scarier. It's also a completely different discussion. :)

True, true, and to hell with it I even like the JVM. I just don't differentiate google or facebook from oracle, they're equally "corporatish" for me :)


Oracles flagship product can be replaced by Postgres? No?

ps. Why compare with top 3 american internet giants, what's the point?

This is actually cool, but I'll wait for an open-source, patent-troll-protected, implementation. Oracle can even try the Microsoft solution of promising never to sue you for reimplementing their stuff, but, if I don't trust Microsoft to keep their promises, why would I trust Oracle?

BTW, Am I the only one who noticed the Lord of the Rings theme?

probably oracle has now patents on ecery tiny detail...

Yes. The first slide basically guaranteed that I will never look into this project.

The risk of touching anything Oracle-related is not worth the potential benefit.

After hearing how Solaris support when down the drain with Oracle stewardship, I'd say staying away from anything Oracle sounds reasonable. At least if it isn't under something like GPL with a patent grant.

(Yes, the fact that they screwed minor Solaris customers does not translate to them suing over IP -- but then again, they already did the last part to).

Oracle is probably a great company to invest in; they make a lot of money -- but that alone doesn't make them a good stakeholder in your core business. I wouldn't want to partner with McDonalds either.

How very open-minded of you. At least consider the project on it's metrics, not on your distaste for a particular corporation.

Being under control by Oracle is part of it's merits, and it counts massively negatively for a lot of us.

On HN perhaps, but not on the big corporations league.

This is amazing.

First because it means we can have _fast_ versions of existing languages.

Second because we can interact with the _huge_ amount of JVM libraries (this is a very big deal).

Third because SubstrateVM seem to be enabling the things I really like about Go: low memory footprint, fast startup time, and easy deployment (give me a binary that does everything I need).

They just need to make sure native interop is easy (both C and C++), and we have a winner!

So what is the Substrate VM? The main presentation appears to be about Truffle, a language-implementation framework, which looks neat by itself.

The "Substrate VM Execution Model" slide talks about Ahead Of Time Compilation, which makes it sound like it might actually be the LLVM-based Substrate VM: http://vmkit.llvm.org. Alternatively, could it be a version of Maxine?

AOT as an option certainly sounds interesting...

The SVM here is unrelated to the LLVM project. The SVM allows you to take a Java program (with some restrictions) and turn it into single static binary that is the JVM, libraries and the program all compiled and optimised together. So you can compile our Ruby implementation and get a ruby binary just like MRI - no JVM needed at runtime. Then startup time is about the same as MRI.

Which representation is used to produce the static binary? Is it a native binary? What is the backend? Gcc, LLVM?

It's just a Java program that writes bytes to a file in the object format and machine code of whatever system you're targeting. It doesn't use any other assembler or compiler apart from Graal (which is also in Java).

Graal is Maxime's successor.

Oracle is planning eventually to replace the C++ based JIT with Graal in some version following 8.

Source? I'd like to read up a bit more about this. Update: found this: http://lafo.ssw.uni-linz.ac.at/papers/2012_SPLASH_Truffle_Sl...

No ByteCode in between, sounds a lot like v8, and incidentally that guy works on V8 previously.

That is really, really, cool.

Definitely an interesting project.

It would be interesting to compare this to parrot[1], which had similar goals.

Javascript and the JVM are currently occupying this niche to a certain degree, although neither were originally designed for this niche.

1: http://www.parrot.org/

Anyone having an idea why parrot is not more popular?

Because it's not really a good VM.

It's full of good ideas, but also full of cruft and years of technical debt. It doesn't have a JIT compiler, (nearly) no async IO, threading support is pretty new and not battle-tested nor documented very well.

Sounds like Perl 6!

Actually, that's part of the issue. Most of Parrot's semantics are exactly those of Perl 6 (right down to the opcodes for Boolean testing), and those don't get along well with most languages.

Rakudo on the JVM does use a JIT compiler. Surprise!

The fact that Rakudo on the JVM can take advantage of the JIT to optimize simple math in tight loops fails to impress me, but I suppose it's one more half-implemented feature you can tick off the back of the box, so hooray.

Because supporting the Rakudo attempt to implement Perl 6 turned out to be an unpleasant slog through an unnavigable field of abuse and nonsense:


Because it's associated with Perl 6 which is commenly believed to be vaporware (yes, even if there have been releases). And because they want to make it so perfect that people can't remember why they used to be excited about it.

FWIW, the problem with their having been "releases" is that there is no reference release; Larry Wall is uninterested in doing an implementation.

>Because it's associated with Perl 6 which is commenly believed to be vaporware

Or rather just "because it's associated with Perl".

I couldn't gather much additional information from the slides, but do I understand this correctly that this substrate vm is an alternative jvm implementation that is faster and requires less memory? Would it be possible to port the usual jvm projects like Clojure or Scala to this in order to claim the same speed/memory gains as the described ruby implementation?

The speed / memory gains seem to be because of optimising away the dynamic type overhead.

So Clojure yes but Scala, no.

First Time there is any public numbers on Topaz, the RPython / PyPy implementation of Ruby.

Note: Only Two benchmarks presented in the Paper, and as with any benchmarks take it with a grain of x.

If it wasn't for JVM Truffle, Topaz is the fastest implementation of Ruby. And by a large margin. Assuming Ruby 2.0 was 5x faster then Ruby 1.8, Topaz is 8-10x faster then Ruby 2.0!

And Truffle is about 1.5-2.5x faster then Topaz.

Now i want to know would there be a better faster FFI for SVM.

And how far are we from seeing release of SVM?

Huh? Where is there any benchmark of Topaz in the paper? All there is involving Topaz is the percentage of RubySpec passed — nothing about performance!

um......Its right there in the performance chart?

How the hell did I miss that when I looked at that before? Ignore me.

LOL, i cant wait to see all these coming, Topaz Ruby 2.1 Rubinius 2.0 And this Ruby with SVM

The Javadoc shows this to be some sort of a general VM construction kit, with access to some very low-level stuff (machine code generation): http://lafo.ssw.uni-linz.ac.at/javadoc/graalvm/all/index.htm...

Now if the SVM is to have a very fast FFI I think it would have the potential to be awesome. Especially as the SVM seems to be easily embeddable...

Neat. However, I had a hard time quieting my Algebra teacher's memory from tut-tutting the example's lack of the other quadratic root.

And the formula given has at least two bugs, I think. They give it as

     -b + (Math.sqrt(b**2 - 4*a*c)) / 2*a
Shouldn't it be

    (-b + (Math.sqrt(b**2 - 4*a*c))) / (2*a)
(still ignoring the other root, of course).

Even wierder, the tree in their presentation seems to represent:

     -b + (Math.sqrt(b**2 - 4*a*c)) / (2*a)
Which is neither the formula we'd expect nor the formula they show.

Apologies for my maths - I think I simplified the serialised expression to reduce space on the slide or something like that, and broke it in the process.

I understand you're an insider, so can you please explain what is the actual breakthrough compared to the previous attempts to make a faster Ruby?

And how your approch fares compared to the state of art JIT implementations? Can your solution produce a faster Lua than LuaJIT, for example? Why starting with Ruby for which you implement 40%? And what have you used from JRuby?

There are several techniques working together here. The Truffle system allows the running program to gradually become statically typed over time (slide 9), where as JRuby has to go through a generic IRubyObject type for almost everything. Also, where JRuby has to continually check that methods have not been redefined, we never check and instead we go in and stop the running machine code when a method is redefined (slide 34).

The advantage of running on the Substrate VM is that, unlike JRuby, our startup time is about the same as MRI (slide 19).

There are several languages using this system, both by us (JS, Ruby) by academic partners (Python, R) and others (Smalltalk is one I know of). This talk just used Ruby as an example.

We used the parser from JRuby.

How is this related to the JVM? I understand there's the compilation/type-information-flow part, and the exposing the JVM's internals part. How does it all fit together?

For those looking for alternatives to an Oracle implementation of this concept there is http://www.ffconsultancy.com/ocaml/hlvm/ Which uses LLVM underneath.

Caveat: I've not used HLVM or looked at its implementation, but the scope seems very similar.

Looks interesting. Is this anything more than an internal research project?

From the second slide: "The following is intended to provide some insight into a line of research in Oracle Labs."

I hope this is not yet another project that holds a ton of promises and never end up delivering cause it looks absolutely awesome on paper.

If only it wasn't oracle. . .

Then i'd actually read this powerpoint!

But in all seriousness if this becomes like LLVM and well developed it could be extremely helpful. .

However, i'm sure Oracle would find an awesome way to make it "oracle enterprise trademarked" and make it into some silly product that aggravates more people than even java which might be impressive!

I thought the Ruby VM already did invalidate() on things rather than checking, and that was a common practice on dynamic languages (doesn't the Objective-C runtime do this also?). I am really excited for this but agree that it is bittersweet the project is owned by Oracle.

Looks interesting -- too bad it's controlled by a creepy asshole who thinks the NSA is essential and says stupid shit like "Who's ever heard of government misusing information?"

I fail to see the difference between this and LLVM. Oh yes LLVM is BSD license!

LLVM isn't focused on JIT compilation or GC. Obviously you can implement both using LLVM, and I think there is still a rudimentary JIT built in, but if your language wants to support cross-platform binary distribution, runtime optimization or world-class GC, you will have a lot more work ahead of you with LLVM.

Some seriously impressive numbers. Hope they take this further :)

Add it to some benchmark sites like http://www.techempower.com/benchmarks/#section=data-r6

It would be interested how it compares to OpenResty (LuaJIT), Node.js (V8 JS engine) & co.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact