

Hadoop should target C++/LLVM, not Java (because of watts) - tfincannon
http://www.trendcaller.com/2009/05/hadoop-should-target-cllvm-not-java.html

======
chibea
Are there any compiler features which change the ratio power consumption/clock
cycles? If yes, that's what he should have talked about and give some
evidence.

If no, let's assume power consumption is a linear function of clock cycles and
focus on cycles.

He claims JIT would cause 'periodically run background optimizer tasks'. He
does not actually state if he is talking about single or multi-processor
systems. Let's assume first, he is talking about a single-processor system.
Then, the background tasks are actually run on the same CPU and thus are
counted in the benchmarks. If he, instead, is talking about a multi-processor
(-core) system, he claims that even in a single threaded benchmark, there
might be hidden cost (hidden, since runtime is improved by 'stealing' cycles
from another CPU) inflected by these background tasks. He is probably right
here, since besides JIT, there is e.g. garbage collection which in fact might
be running concurrently on another thread. If you only measure runtime of such
a benchmark it won't be an accurate measure of the power consumption. But: If
you measure CPU-time/cycles of the benchmark all the threads are taken into
account with all the 'hidden' costs. He provides no evidence that the
benchmarks failed to do this.

What he could have meant is that some benchmarks use 'warm-up' to have all the
hot spots JITed before measuring. He's right here: for short running tasks JIT
has to be taken into account. For long running (massive data processing) tasks
it is at least not obvious that JITing is slower or faster than ahead-of-time
compiling. There are some opportunities for a JIT-compiler to actually use
less cycles than AOT-compiled code. E.g. a virtual method call in C++ has to
use the virtual function table always, this indirection might introduce
additional cycles due to bad locality. A JIT compiler knowing that a
particular interface call at a call-site in fact is monomorphic, can
optimistically skip one level of indirection and thus save cycles.

The vendors of JIT compiler put much effort in the balance between JIT
overhead and possible performance gain. One can assume that they know how to
create meaningful benchmarks. If in doubt, provide counter-examples.

BTW: The correct unit of comparison would have been work/joule and not
work/watt.

~~~
Nelson69
There are compiler research projects to build power optimizing compilers
rather than just pure performance optimizing compilers. I was going to post a
link or two but when I googled I found dozens and dozens of them, so take your
pick.

I don't have a good idea of what kind of difference that they can make yet.
The article's whole premise seems to be that C++ code is linearly faster than
Java code so that linear difference must also be there for power consumption.
The differences in power consumption for memory are pretty small. We'd need
some numbers to judge whether or not that's interesting and it'd also be
interesting to see a JVM optimized for power consumption and how that factored
in to the equation. The disk usage is probably the largest user of energy in
these systems, presumably a C implementation and Java implementation would use
the same disk layout so disk usage should be roughly the same.

The hyptertable guys think the performance differences are worth it to use
C/C++ but they don't mention power consumption differences.

------
pieter
It seems like he makes two errors

1) He thinks that you can retarget C++ code compiled to LLVM BC to multiple
targets. That's not the case, you can only JIT LLVM BC to the platform you're
targetting with your C++ code. That means that you might just as well take
LLVM out of the picture, and compare Java vs. C++.

2) Java vs. C++ on 'work/watt' is almost the same as Java vs. C++ on
'work/time', given that you keep the same machine to run it on. You'll have to
keep CPU utilization maximised for both, which doesn't seem unreasonable. The
only thing you then have to care about is memory utilization. If Java takes
(for example) 2x as much memory as C++, then just look up how much power the
extra memory uses and add that to your measure. All the other stuff
(Background tasks, JIT'ing, GC'ing, whatever) is already part of the
'work/time' benchmark, so you don't have to care about that.

~~~
bad_user
_"That means that you might just as well take LLVM out of the picture, and
compare Java vs. C++."_

Not really, if you can dynamically load libraries at runtime, then you can
separate the platform-dependent code from the main codebase, making it
possible to target multiple targets without recompilation.

 _"The only thing you then have to care about is memory utilization"_

I don't know much about the JVM internals, but isn't JIT'ing a more expensive
process (as in CPU + memory) for the JVM? If you keep the CPU load maximized
for both, won't more of those resources be used by the JVM's JIT'ing and
GC'ing?

~~~
pieter
"Not really, if you can dynamically load libraries at runtime, then you can
separate the platform-dependent code from the main codebase, making it
possible to target multiple targets without recompilation."

It's not just libraries. It also stuff like the size of pointers and integers
and the way that structs are aligned and packed.

"I don't know much about the JVM internals, but isn't JIT'ing a more expensive
process (as in CPU + memory) for the JVM? If you keep the CPU load maximized
for both, won't more of those resources be used by the JVM's JIT'ing and
GC'ing?"

It doesn't matter how the JIT works, if you just measure performance. The JIT
is just an internal thing, which you don't have to care about if you do the
right benchmarks. The only thing you're interested in is is stuff like
requests/second, and you shouldn't care how that is handled (through a JIT,
interpreter, or native code).

------
akeefer
It would be a far more compelling argument if there were actual numbers in
there showing that for those kinds of loads Java does in fact require more
wattage; for long-running processes, once the VM has JITed everything down to
machine code, one might reasonably expect that it would be pretty close to the
consumption of a natively compiled C++ application. He could turn out to be
right, I just don't think you can take it as self-evident that the Java VM
must be using more power.

------
lars
His argument seems to be that the JIT somehow makes Java less efficient.
That's plain wrong. Yes, running the JIT will take some processor juice, but
you run it because you'll save processor time in the long run.

Yes, Java will probably use more watts than your C++ equivalent, but that's
despite of the JIT, not because of it.

------
wglb
Perhaps this is right. If it were done in C++, then the project might be
later, and thus would save by delaying many node purchases, or even entire
projects.

On the other hand, the author of the article does not factor into the equation
calorie consumption of the programmers, nor the differential calorie
consumption of Java programmers vs C++ programmers. Perhaps there is enough
differential to close the gap.

------
barrettcolin
Maybe I'm missing something here: Java is bad because of "dirty little
secret(s) of JIT technology". But doesn't LLVM IR need to be JIT-ed as well?

------
tezza
Easy targets have been picked as the base metrics.

This article reminds me of the genius who thought Google could save the world
by making their homepage background _black_ instead of white.[1]

The author should also consider how secure the JVM sandbox model is. What is
the Time/Watt cost of securing the boxen if JVM is ditched ?

\---

<http://hardware.slashdot.org/article.pl?sid=07/07/27/054249>

------
kqr2
Ironically, google uses java (although dalvik is a modified jvm) for android
whose target audience is definitely concerned with power.

Android also includes a lot of application hooks for power management.

~~~
antirez
Well google does not pay the bill for the users phone. Still I could like to
see actual meters of power usage of java va c++ apps

~~~
chibea
Right, google does not pay the bill, but that's not the point of android. It's
meant as a platform for developers which then don't have to worry about CPU-
specifics. And as important, it is meant as a platform for mobile device
vendors. Though, google had to make sure that dalvik has a reasonable
performance compared to native apps. Even if android Java apps are 2-10x
slower than corresponding native applications, that's ok, because further
savings will come with the next level of power-saving CPUs for mobile devices
which will be able to run current android apps unchanged.

------
babo
Anybody has experience with Hypertable what the author mentions?

~~~
vicaya
Raise hand, I'm a Hypertable developer.

