
In what cases is Java faster than C++? - btipling
http://www.quora.com/In-what-cases-is-Java-faster-if-at-all-than-C#ans386926
======
SeanLuke
I write a lot of hand-optimized Java code, and have found that while Java as a
language can be pretty fast, and its HotSpot team is to be commended, Sun's
_libraries_ are often atrocious. For example, until Java 1.5 (I think, maybe
1.4) ArrayList's get(), set(), and add() methods were not inline-able, and a
one-line fix to the problem languished for years on Sun's bug forums. Sun's
decision to go with type erasure was also a huge mistake in my opinion: as a
result we have hilariously inefficient boxing and unboxing operations in
poorly written Java code. Sun's push towards iterators hasn't helped things
either.

If you eschew all this and write strongly optimized code, I think the single
biggest spot where Java is unquestionably slower than C/C++ is in array
accesses. In Java, to set a slot in a two-dimensional array Java must first
test to see if the array is non-null, then test that the X bounds are correct,
then test to see if the appropriate Y subarray is non-null, then test to see
if the Y bounds are correct, then finally set the value.

In C the compiler does a multiply and an add and sets the slot.

~~~
kirktrue
Yes, the array bounds checks are inefficient, but IIRC, they're there to
prevent buffer overflows that are/were prevalent in C applications.

~~~
getsat
We're definitely still on the "are" side of that one, unfortunately.

------
blinkingled
Cliff Click nailed the whole Java vs C/C++ performance subject in one terrific
post - [http://www.azulsystems.com/blog/cliff-
click/2009-09-06-java-...](http://www.azulsystems.com/blog/cliff-
click/2009-09-06-java-vs-c-performanceagain) . Seriously no further discussion
is needed on this topic until even more significant performance advances are
made either in C/C++ compilers or the JVM :)

~~~
pkaler
That article is naive. It assumes that C/C++ developers concerned with
performance write object-oriented code. They do not. They write data-oriented
code.

Most of the multi-threading issues go away. This is how fast code is written
for the Cell processor in the Sony PS3.

Most malloc/free issues go away. Most data is a value type or has an explicit
lifetime (game-lifetime, mission-lifetime, frame-lifetime for the game
development case).

More information on data-oriented design: <http://gamesfromwithin.com/data-
oriented-design>

~~~
blinkingled
I didn't read through the link fully but it sounds to me like after changing
the way you program you can get C++ to deliver the kind of performance a JVM
gives you without doing anything special? Or are you saying with data-oriented
programming C++ becomes significantly faster than Java?

To the extent that most people aren't going to change their programming
methodology just to avoid Java and C++ still has its niche usages - I don't
think the article is naive in any way.

~~~
pkaler
Data-oriented programming produces programs that are a magnitude faster than
object-oriented programs. It's the same difference as stateful vs stateless
code. And it's the same difference as RESTful APIs vs RPC.

If you care about performance and scalability then you write stateless code
and use RESTful interfaces. You also choose to write data-oriented code rather
than object-oriented code.

Data-oriented code is not possible in Java because you can't create complex
value types and you can't control when and where memory gets allocated and
deallocated.

~~~
repsilat
> Data-oriented code is not possible in Java...

There are obviously techniques in data-oriented code that aren't possible in
Java, but a lot of the key insight is applicable in just about every language.
Structures-of-arrays, defining the data in objects based on usage patterns
instead of responsibilities and "model-the-world" categorisation...

Java doesn't have a `sizeof` operator, and objects probably don't tend to
store their object member variables by value, and it's not always obvious
which function calls cost how much... Problems, to be sure, but if you
_really_ want data-oriented code you can usually contort yourself far enough
to get it.

------
jedbrown

      quux_t *foo(int i) {
          /* Trick to prevent compiler from inlining */
          if(i == 0) {
              quux_t *bar = (quux_t *) malloc(sizeof(quux_t));
              return bar;
          } else {
              return foo(0);
          }
      }
    

Seriously? GCC removes this obfuscation at -O2, Clang does it at -O1. Check
the disassembly before making stupid claims like this.

~~~
liuliu
He can use __attribute__ to enforce noinline:
<http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html>

Though these flags are gcc specific (works in clang as well). I found it to be
tremendously useful in some cases.

------
goalieca
Well, note how the article said "naive" memory allocation and even then it was
specific to a microbenchmark. I've written very high performing memory
allocation routines for multi-threaded scenarios and if you know the types of
allocations you will be making you can leave any GC or standard memory
allocation routines in the dust!

~~~
agazso
In C or C++ you can even allocate objects on the stack, which has only the
cost of increasing the stack pointer (meaning it is very cheap), and also you
have precise control over the object's lifetime.

~~~
akeefer
Note that by using a technique known as escape analysis, a virtual machine
like the JVM can detect that the lifetime of an object is such that it doesn't
leave the context of a particular method and then stack-allocate the object
implicitly. That may seem like a limited optimization, but when you combine it
with method inlining it gets a lot more useful. I believe the optimization is
turned off by default in the Sun/Oracle JVM but can be enabled via the
-XX:+DoEscapeAnalysis option. For programs that allocate large numbers of
short-lived objects, it can make a significant performance difference.

~~~
whakojacko
Note: If you use scala, escape analysis can be particularly helpful. The "pimp
my library" pattern for extending classes involves creating wrapper objects,
and these can often be optimized away.
[http://www.decodified.com/scala/2010/08/27/scala-rich-
wrappi...](http://www.decodified.com/scala/2010/08/27/scala-rich-wrapping-
performance.html) has more details

------
mrcharles
The default malloc and free, depending on platform, can be ridiculously slow
operations (on the range of 50-100µs). Those numbers are old, they are from
some testing I did nearly a decade ago. I'd hope it would have improved right
now.

But at that time, I wrote a memory manager which had malloc and free calls
which were a magnitude faster, at least ten times.

When you need it to be fast, it can be fast. If you don't want it to be fast,
or you have to resort to tricks like preventing the compiler from inlining
(wtf?), then you really are doing it wrong.

~~~
ekidd
Really? One allocation every 50 to 100ms? That would mean you could only do 10
to 20 memory allocations per second. Now, I know that malloc is often very
slow, but that seems unlikely.

Or do you mean 50 to 100 microseconds? That's still horrible, but I could
believe it.

~~~
mrcharles
Oops, yeah, I meant microseconds. Will fix it.

~~~
ekidd
Thank you. Malloc can be pretty bad, but we don't want to start any crazy
rumors about it. :-)

------
svag
A more general discussion about performance between C++ and Java can be found
below:

[http://stackoverflow.com/questions/145110/c-performance-
vs-j...](http://stackoverflow.com/questions/145110/c-performance-vs-java-c)

~~~
agazso
Found a great comparison between the answers:

<http://zi.fi/shootout/>

It is basically the great programming language shootout at
<http://shootout.alioth.debian.org/> redone with different GCC optimization
levels.

It is clearly visible where Java Hotspot shines, but in most cases C++ wins by
a factor of two. It also mentions a technique how you can do profiling
analysis with GCC that makes similar optimizations than Hotspot does.

~~~
igouy
>>It is basically ...
[http://shootout.alioth.debian.org/<<](http://shootout.alioth.debian.org/<<);

Wrong.

It is basically the old Doug Bagley programs which were replaced 3 years
before the zi.fi/shootout article was posted.

A little history - <http://c2.com/cgi/wiki?GreatComputerLanguageShootout>

In this new decade -

[http://shootout.alioth.debian.org/u64q/java.php#faster-
progr...](http://shootout.alioth.debian.org/u64q/java.php#faster-programs-
measurements)

------
js4all
This reminds me of the discussion many years ago: "In what cases is C faster
than Assembler". This sounded as wired as the current question, but it wasn't.

That time C optimizers were getting better and better up to the point they
knew "tricks", where you had to be a very, very good assembler programmer to
compete. History seems to repeat.

------
badkins
From this article, it seems that the answer is: "When the C++ coder is
intentionally trying to write slow code."

------
alok-g
An excerpt from [http://www.azulsystems.com/blog/cliff-
click/2009-09-06-java-...](http://www.azulsystems.com/blog/cliff-
click/2009-09-06-java-vs-c-performanceagain)

>> "Value Types, such as a 'Complex' type require a full object in Java. This
has both code speed and memory overheads."

I have programmed in C# and C++ (never Java), but I found the above to be the
key issue why my programs would run significantly slower with C#. Here's the
C# language discussion thread where I posted details about my issue:

[http://social.msdn.microsoft.com/Forums/en/csharplanguage/th...](http://social.msdn.microsoft.com/Forums/en/csharplanguage/thread/47d10fcb-2d69-451a-bb97-023f1f9113f3)

------
cliffr
Is the test in the article a valid test? In the Java case the wouldn't the
program quit before the garbage collector gets a chance to run? Wouldn't that
would be kind of like running the C program without free()?

~~~
marshray
No.

Yes.

Yes.

The benchmark code also re-parses the integer command-line argument on every
iteration of the loop.

This benchmark is meaningless.

------
alecco
Wrong question. It doesn't consider associated drawbacks of the JVM strategy.
Think multi-second pauses for full GC scans on real scenarios instead of
artificial benchmarks.

~~~
axod
I assume you're talking about the GC _defaults_. It's not hard to switch it to
doing incremental garbage collection, or one of the other numerous strategies
available.

~~~
alecco
Incremental GC is yet another JVM trade-off. It trashes the CPU cache all the
time and with 100ns slowdown for each cache miss the gains I've seen on real
life code are not even remotely close to what's advertised.

And Java objects in real life tend to be big and not very local, from what
I've seen.

~~~
axod
Well, everything is a trade off. It depends what you're running.

I use incgc with very good results compared to the defaults. The defaults
usually result in pauses (Not good for something like mibbit), also defaults
simply can't keep up with object churn.

So, anecdotally, incgc works wonderfully.

------
albertzeyer
I guess both Java and OCaml will see that the object can be generated on the
stack which makes it much faster.

By using malloc in the C implementation, you force it to be on the heap.

~~~
thelema314
Actually, OCaml never generates objects like this on the stack - value
creation is always taken literally. The OCaml compiler is really allocating
all those records. It's just that their lifetime is almost zero, so the cost
to clean them up is zero (as none get promoted from the minor heap to the
major heap). And the cost to create them is almost zero, just a += and a
compare.

~~~
albertzeyer
Ah, interesting. Does it also have a GC or some own custom allocation
implementation?

Because if it really would use something like malloc, I don't really see how
it should be faster than the C version.

------
biobot
JVM is written in C/C++ so technically Java cannot be faster than C/C++
because everything in Java are (direct / indirect) products of C/C++. :)

------
zandorg
Java's graphic operations are much faster than some arbitrary C++ library
which hasn't been optimised as well.

~~~
dkersten
X's Y operations are much faster than some arbitrary Z library which hasn't
been optimised as well.

------
rubashov
Java is never definitively faster than C++, because there's nothing Java does
that can't be done in C++, including JIT optimizations.

~~~
sfphotoarts
This is not true. The runtime environment for Java is capable of changing the
program on the fly in response to dynamically determined bottlenecks. To
implement such a system in C would be to re-implement Java.

What you cannot do in C/C++ that you can in Java is respond to program
inefficiencies at runtime, with runtime knowledge.

~~~
rubashov
You are mistaken. There are JIT libraries for C++. There are also dynamic
profiling compilers (intel).

~~~
TheCondor
Are you talking about profile guided optimization or a new Intel compiler that
produces and application that is capable of dynamically rebuilding its own
binary code on the fly? Presumably in the event that it can detect a better
way to execute a chunk of code.

There are JIT libraries for C++ for building applications which dynamically
compile stuff, I'm not aware of any that dynamically recompile the C++ though.

