Hacker Newsnew | comments | show | ask | jobs | submit login
In what cases is Java faster than C++? (quora.com)
63 points by btipling 1523 days ago | 57 comments



I write a lot of hand-optimized Java code, and have found that while Java as a language can be pretty fast, and its HotSpot team is to be commended, Sun's libraries are often atrocious. For example, until Java 1.5 (I think, maybe 1.4) ArrayList's get(), set(), and add() methods were not inline-able, and a one-line fix to the problem languished for years on Sun's bug forums. Sun's decision to go with type erasure was also a huge mistake in my opinion: as a result we have hilariously inefficient boxing and unboxing operations in poorly written Java code. Sun's push towards iterators hasn't helped things either.

If you eschew all this and write strongly optimized code, I think the single biggest spot where Java is unquestionably slower than C/C++ is in array accesses. In Java, to set a slot in a two-dimensional array Java must first test to see if the array is non-null, then test that the X bounds are correct, then test to see if the appropriate Y subarray is non-null, then test to see if the Y bounds are correct, then finally set the value.

In C the compiler does a multiply and an add and sets the slot.

-----


It's worth noting that proposals, like JSR 83 (http://jcp.org/en/jsr/detail?id=83), to address Java's multi-dimensional array shortcomings have been around for quite some time. Of course, JSR 83 just sat around for five years before being abandoned; maybe Java folks don't think efficient multi-dimensional arrays are useful and/or common.

-----


Yes, the array bounds checks are inefficient, but IIRC, they're there to prevent buffer overflows that are/were prevalent in C applications.

-----


We're definitely still on the "are" side of that one, unfortunately.

-----


Sure, of course. But safety isn't the discussion.

-----


Have you tried the JRockit JVM? I was benching various optimisations on JRockit and HotSpot, and found array accesses were far less costly in JRockit.

-----


Cliff Click nailed the whole Java vs C/C++ performance subject in one terrific post - http://www.azulsystems.com/blog/cliff-click/2009-09-06-java-... . Seriously no further discussion is needed on this topic until even more significant performance advances are made either in C/C++ compilers or the JVM :)

-----


That article is naive. It assumes that C/C++ developers concerned with performance write object-oriented code. They do not. They write data-oriented code.

Most of the multi-threading issues go away. This is how fast code is written for the Cell processor in the Sony PS3.

Most malloc/free issues go away. Most data is a value type or has an explicit lifetime (game-lifetime, mission-lifetime, frame-lifetime for the game development case).

More information on data-oriented design: http://gamesfromwithin.com/data-oriented-design

-----


I didn't read through the link fully but it sounds to me like after changing the way you program you can get C++ to deliver the kind of performance a JVM gives you without doing anything special? Or are you saying with data-oriented programming C++ becomes significantly faster than Java?

To the extent that most people aren't going to change their programming methodology just to avoid Java and C++ still has its niche usages - I don't think the article is naive in any way.

-----


Data-oriented programming produces programs that are a magnitude faster than object-oriented programs. It's the same difference as stateful vs stateless code. And it's the same difference as RESTful APIs vs RPC.

If you care about performance and scalability then you write stateless code and use RESTful interfaces. You also choose to write data-oriented code rather than object-oriented code.

Data-oriented code is not possible in Java because you can't create complex value types and you can't control when and where memory gets allocated and deallocated.

-----


> Data-oriented code is not possible in Java...

There are obviously techniques in data-oriented code that aren't possible in Java, but a lot of the key insight is applicable in just about every language. Structures-of-arrays, defining the data in objects based on usage patterns instead of responsibilities and "model-the-world" categorisation...

Java doesn't have a `sizeof` operator, and objects probably don't tend to store their object member variables by value, and it's not always obvious which function calls cost how much... Problems, to be sure, but if you really want data-oriented code you can usually contort yourself far enough to get it.

-----


That's a fascinating article, and you may be right that little more need be said, but the author wanders WAY off the subject when he gets to the benefits of Java. Claiming that you can use the time saved by having garbage collection to make your program more efficient in other ways does not make a convincing argument when you're discussing fine points of performance.

In case that wasn't clear, he claims not that garbage collection is more efficient, but the saved programmer time can be used in some hand-waving way to improve efficiency. Likewise he wanders off into wooly territory with the claim that very large Java programs can be written more quickly. These would be perfectly fine if he were posting the reasons he thinks Java is better, but in a technical discussion of all-out performance it's just offtopic.

-----


Strictly speaking, what you say is true, but practically those are significant and relevant benefits of Java.

Considering that performance is always a trade-off with price - if you think about price/performance ratio - the ability to write programs quickly is of relevance.

-----


  quux_t *foo(int i) {
      /* Trick to prevent compiler from inlining */
      if(i == 0) {
          quux_t *bar = (quux_t *) malloc(sizeof(quux_t));
          return bar;
      } else {
          return foo(0);
      }
  }
Seriously? GCC removes this obfuscation at -O2, Clang does it at -O1. Check the disassembly before making stupid claims like this.

-----


He can use __attribute__ to enforce noinline: http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html

Though these flags are gcc specific (works in clang as well). I found it to be tremendously useful in some cases.

-----


Well, note how the article said "naive" memory allocation and even then it was specific to a microbenchmark. I've written very high performing memory allocation routines for multi-threaded scenarios and if you know the types of allocations you will be making you can leave any GC or standard memory allocation routines in the dust!

-----


In C or C++ you can even allocate objects on the stack, which has only the cost of increasing the stack pointer (meaning it is very cheap), and also you have precise control over the object's lifetime.

-----


Note that by using a technique known as escape analysis, a virtual machine like the JVM can detect that the lifetime of an object is such that it doesn't leave the context of a particular method and then stack-allocate the object implicitly. That may seem like a limited optimization, but when you combine it with method inlining it gets a lot more useful. I believe the optimization is turned off by default in the Sun/Oracle JVM but can be enabled via the -XX:+DoEscapeAnalysis option. For programs that allocate large numbers of short-lived objects, it can make a significant performance difference.

-----


Note: If you use scala, escape analysis can be particularly helpful. The "pimp my library" pattern for extending classes involves creating wrapper objects, and these can often be optimized away. http://www.decodified.com/scala/2010/08/27/scala-rich-wrappi... has more details

-----


Often that allocation should be free - when you do a "proper" function call you'll be increasing that pointer already, and the total size of stack-bound objects in many functions can be determined at compile time.

(Erm... I think. Need to read more...)

-----


Exactly. What a lot of dynamic VM languages provide are very clean semantics and the ability to prototype very quickly. Competently tuned GC with a modern VM will get you beyond the ability of a bad or mediocre C coder, calendar-time faster, though at the price of more machine resource use.

I'm waiting for technology that makes this trade-off obsolete. One should be able to transition from fast prototyping to solid and optimal production code -- incrementally and without great pain. We as an industry are just about ready for this.

-----


I'm lost - what does "competently tuned GC" mean? There is some way to get the GC to NOT ever run? not take forever? What?

And even if you contrive your app to use memory very very carefully, with a shared runtime with 100 other apps (e.g. on a server) not everybody is as nice and GC still runs/stalls the system.

-----


I'm lost - what does "competently tuned GC" mean?

You can easily get generational GC to be an order of magnitude slower than it should be by changing just one or two settings.

There is some way to get the GC to NOT ever run?

Yes, this actually came up at Smalltalk Solutions many years ago. One presenter was using Squeak as an advanced debugger at a company producing a FPS game. If all your functionality happens between frames, you can rig it so you never GC. You just throw away all of your memory outside of "perm" space every time.

With VisualWorks Smalltalk, you can change the settings so that the bulk of your GC work happens using incremental GC. It's not uncommon to get to the point where GC never takes up more than a few milliseconds. That's plenty good for most people. Admittedly that's not so good if your "light" request load is well over 1000 transactions per server per second.

And even if you contrive your app to use memory very very carefully, with a shared runtime with 100 other apps (e.g. on a server) not everybody is as nice and GC still runs/stalls the system.

When you need enough virtual hosts to be running 100 server processes -- that's likely when you need to be transitioning out of "rapid prototyping" mode and onto processes with just a little more rigor.

What I'm advocating is a language where both the rapid prototyping and running optimized mature code efficiently is possible. Not only possible, but easy to transition between.

-----


The default malloc and free, depending on platform, can be ridiculously slow operations (on the range of 50-100┬Ás). Those numbers are old, they are from some testing I did nearly a decade ago. I'd hope it would have improved right now.

But at that time, I wrote a memory manager which had malloc and free calls which were a magnitude faster, at least ten times.

When you need it to be fast, it can be fast. If you don't want it to be fast, or you have to resort to tricks like preventing the compiler from inlining (wtf?), then you really are doing it wrong.

-----


Really? One allocation every 50 to 100ms? That would mean you could only do 10 to 20 memory allocations per second. Now, I know that malloc is often very slow, but that seems unlikely.

Or do you mean 50 to 100 microseconds? That's still horrible, but I could believe it.

-----


Oops, yeah, I meant microseconds. Will fix it.

-----


Thank you. Malloc can be pretty bad, but we don't want to start any crazy rumors about it. :-)

-----


50-100ms is just horrible. In fact, the best malloc's show operations in 50-100 nanoseconds.

http://goog-perftools.sourceforge.net/doc/tcmalloc.html

-----


Yes, even glibc malloc is usually around 400 cycles (about 150 nanoseconds) in my tests.

-----


Windows "garbage collects" in its memory manager anyway; so no actually benefit using C++ etc there. It uses some pool with fragmentation algorithms that can take 100s of milliseconds to complete.

-----


A more general discussion about performance between C++ and Java can be found below:

http://stackoverflow.com/questions/145110/c-performance-vs-j...

-----


Found a great comparison between the answers:

http://zi.fi/shootout/

It is basically the great programming language shootout at http://shootout.alioth.debian.org/ redone with different GCC optimization levels.

It is clearly visible where Java Hotspot shines, but in most cases C++ wins by a factor of two. It also mentions a technique how you can do profiling analysis with GCC that makes similar optimizations than Hotspot does.

-----


>>It is basically ... http://shootout.alioth.debian.org/<<;

Wrong.

It is basically the old Doug Bagley programs which were replaced 3 years before the zi.fi/shootout article was posted.

A little history - http://c2.com/cgi/wiki?GreatComputerLanguageShootout

In this new decade -

http://shootout.alioth.debian.org/u64q/java.php#faster-progr...

-----


A lot of replies there are from C#/managed guys who don't quite know what they're talking about, but fake it convincingly enough. See flyswat, Orion Adrian, Jon Norton, etc.

-----


This reminds me of the discussion many years ago: "In what cases is C faster than Assembler". This sounded as wired as the current question, but it wasn't.

That time C optimizers were getting better and better up to the point they knew "tricks", where you had to be a very, very good assembler programmer to compete. History seems to repeat.

-----


From this article, it seems that the answer is: "When the C++ coder is intentionally trying to write slow code."

-----


An excerpt from http://www.azulsystems.com/blog/cliff-click/2009-09-06-java-...

>> "Value Types, such as a 'Complex' type require a full object in Java. This has both code speed and memory overheads."

I have programmed in C# and C++ (never Java), but I found the above to be the key issue why my programs would run significantly slower with C#. Here's the C# language discussion thread where I posted details about my issue:

http://social.msdn.microsoft.com/Forums/en/csharplanguage/th...

-----


Is the test in the article a valid test? In the Java case the wouldn't the program quit before the garbage collector gets a chance to run? Wouldn't that would be kind of like running the C program without free()?

-----


No.

Yes.

Yes.

The benchmark code also re-parses the integer command-line argument on every iteration of the loop.

This benchmark is meaningless.

-----


Wrong question. It doesn't consider associated drawbacks of the JVM strategy. Think multi-second pauses for full GC scans on real scenarios instead of artificial benchmarks.

-----


I assume you're talking about the GC defaults. It's not hard to switch it to doing incremental garbage collection, or one of the other numerous strategies available.

-----


Incremental GC is yet another JVM trade-off. It trashes the CPU cache all the time and with 100ns slowdown for each cache miss the gains I've seen on real life code are not even remotely close to what's advertised.

And Java objects in real life tend to be big and not very local, from what I've seen.

-----


Well, everything is a trade off. It depends what you're running.

I use incgc with very good results compared to the defaults. The defaults usually result in pauses (Not good for something like mibbit), also defaults simply can't keep up with object churn.

So, anecdotally, incgc works wonderfully.

-----


I guess both Java and OCaml will see that the object can be generated on the stack which makes it much faster.

By using malloc in the C implementation, you force it to be on the heap.

-----


Actually, OCaml never generates objects like this on the stack - value creation is always taken literally. The OCaml compiler is really allocating all those records. It's just that their lifetime is almost zero, so the cost to clean them up is zero (as none get promoted from the minor heap to the major heap). And the cost to create them is almost zero, just a += and a compare.

-----


Ah, interesting. Does it also have a GC or some own custom allocation implementation?

Because if it really would use something like malloc, I don't really see how it should be faster than the C version.

-----


JVM is written in C/C++ so technically Java cannot be faster than C/C++ because everything in Java are (direct / indirect) products of C/C++. :)

-----


Java's graphic operations are much faster than some arbitrary C++ library which hasn't been optimised as well.

-----


X's Y operations are much faster than some arbitrary Z library which hasn't been optimised as well.

-----


Java is never definitively faster than C++, because there's nothing Java does that can't be done in C++, including JIT optimizations.

-----


This is not true. The runtime environment for Java is capable of changing the program on the fly in response to dynamically determined bottlenecks. To implement such a system in C would be to re-implement Java.

What you cannot do in C/C++ that you can in Java is respond to program inefficiencies at runtime, with runtime knowledge.

-----


That's true, but most programs don't encounter situations where a whole new class of unforeseen optimizations are needed right now, /and/ it's possible to make a significant improvement without changing any algorithms.

For everything else, there is Profile Guided Optimization.

-----


To implement such a system in C would be to re-implement Java.

Why not use clang compiling to LLVM bytecode and JIT from that?

-----


You are mistaken. There are JIT libraries for C++. There are also dynamic profiling compilers (intel).

-----


Are you talking about profile guided optimization or a new Intel compiler that produces and application that is capable of dynamically rebuilding its own binary code on the fly? Presumably in the event that it can detect a better way to execute a chunk of code.

There are JIT libraries for C++ for building applications which dynamically compile stuff, I'm not aware of any that dynamically recompile the C++ though.

-----


C++ with a JIT is called Java,

Seriously though this is misguided. PGO is clearly a step in the right direction but its simply deferred static optimization. Either you optimize in C at the compile time, or with PGO by running a few statistically representative executions, but these are not dynamic.

The only way to make truly dynamic optimizations is by having a runtime environment and code that is interpreted.

The argument that its not often necessary is a different argument and with only a few moments thought can be seen to be untrue. Take for example and strstr like operation that you naively implement using by walking through the source string. Now let's say you do this a lot, all is good because the input is short, now you receive (in a parser for example) something much much bigger. A JIT has privy information and can make determinations that building a index on the source string to make repeated finds faster is better, C does not, even with PGO because you may never have run a test that encounters this situation.

-----


I saw benchmarks posted comparing a Java supercompiler vs C, and Java + supercompiler actually beat raw C for quite a few test cases. I can't seem to find the study I saw, but if you search for "supercompilers" there's lots of information available.

-----




Guidelines | FAQ | Support | API | Lists | Bookmarklet | DMCA | Y Combinator | Apply | Contact

Search: