Hacker Newsnew | comments | leaders | jobs | submitlogin
Hadoop should target C++/LLVM, not Java (because of watts) (trendcaller.com)
15 points by tfincannon 273 days ago | 18 comments


9 points by pieter 273 days ago | link

It seems like he makes two errors

1) He thinks that you can retarget C++ code compiled to LLVM BC to multiple targets. That's not the case, you can only JIT LLVM BC to the platform you're targetting with your C++ code. That means that you might just as well take LLVM out of the picture, and compare Java vs. C++.

2) Java vs. C++ on 'work/watt' is almost the same as Java vs. C++ on 'work/time', given that you keep the same machine to run it on. You'll have to keep CPU utilization maximised for both, which doesn't seem unreasonable. The only thing you then have to care about is memory utilization. If Java takes (for example) 2x as much memory as C++, then just look up how much power the extra memory uses and add that to your measure. All the other stuff (Background tasks, JIT'ing, GC'ing, whatever) is already part of the 'work/time' benchmark, so you don't have to care about that.

-----

1 point by bad_user 273 days ago | link

"That means that you might just as well take LLVM out of the picture, and compare Java vs. C++."

Not really, if you can dynamically load libraries at runtime, then you can separate the platform-dependent code from the main codebase, making it possible to target multiple targets without recompilation.

"The only thing you then have to care about is memory utilization"

I don't know much about the JVM internals, but isn't JIT'ing a more expensive process (as in CPU + memory) for the JVM? If you keep the CPU load maximized for both, won't more of those resources be used by the JVM's JIT'ing and GC'ing?

-----

5 points by pieter 272 days ago | link

"Not really, if you can dynamically load libraries at runtime, then you can separate the platform-dependent code from the main codebase, making it possible to target multiple targets without recompilation."

It's not just libraries. It also stuff like the size of pointers and integers and the way that structs are aligned and packed.

"I don't know much about the JVM internals, but isn't JIT'ing a more expensive process (as in CPU + memory) for the JVM? If you keep the CPU load maximized for both, won't more of those resources be used by the JVM's JIT'ing and GC'ing?"

It doesn't matter how the JIT works, if you just measure performance. The JIT is just an internal thing, which you don't have to care about if you do the right benchmarks. The only thing you're interested in is is stuff like requests/second, and you shouldn't care how that is handled (through a JIT, interpreter, or native code).

-----

1 point by chibea 273 days ago | link

Right, it would be more interesting to talk about memory consumption and if that might add to power consumption.

-----

9 points by akeefer 273 days ago | link

It would be a far more compelling argument if there were actual numbers in there showing that for those kinds of loads Java does in fact require more wattage; for long-running processes, once the VM has JITed everything down to machine code, one might reasonably expect that it would be pretty close to the consumption of a natively compiled C++ application. He could turn out to be right, I just don't think you can take it as self-evident that the Java VM must be using more power.

-----

6 points by chibea 273 days ago | link

Are there any compiler features which change the ratio power consumption/clock cycles? If yes, that's what he should have talked about and give some evidence.

If no, let's assume power consumption is a linear function of clock cycles and focus on cycles.

He claims JIT would cause 'periodically run background optimizer tasks'. He does not actually state if he is talking about single or multi-processor systems. Let's assume first, he is talking about a single-processor system. Then, the background tasks are actually run on the same CPU and thus are counted in the benchmarks. If he, instead, is talking about a multi-processor (-core) system, he claims that even in a single threaded benchmark, there might be hidden cost (hidden, since runtime is improved by 'stealing' cycles from another CPU) inflected by these background tasks. He is probably right here, since besides JIT, there is e.g. garbage collection which in fact might be running concurrently on another thread. If you only measure runtime of such a benchmark it won't be an accurate measure of the power consumption. But: If you measure CPU-time/cycles of the benchmark all the threads are taken into account with all the 'hidden' costs. He provides no evidence that the benchmarks failed to do this.

What he could have meant is that some benchmarks use 'warm-up' to have all the hot spots JITed before measuring. He's right here: for short running tasks JIT has to be taken into account. For long running (massive data processing) tasks it is at least not obvious that JITing is slower or faster than ahead-of-time compiling. There are some opportunities for a JIT-compiler to actually use less cycles than AOT-compiled code. E.g. a virtual method call in C++ has to use the virtual function table always, this indirection might introduce additional cycles due to bad locality. A JIT compiler knowing that a particular interface call at a call-site in fact is monomorphic, can optimistically skip one level of indirection and thus save cycles.

The vendors of JIT compiler put much effort in the balance between JIT overhead and possible performance gain. One can assume that they know how to create meaningful benchmarks. If in doubt, provide counter-examples.

BTW: The correct unit of comparison would have been work/joule and not work/watt.

-----

1 point by Nelson69 272 days ago | link

There are compiler research projects to build power optimizing compilers rather than just pure performance optimizing compilers. I was going to post a link or two but when I googled I found dozens and dozens of them, so take your pick.

I don't have a good idea of what kind of difference that they can make yet. The article's whole premise seems to be that C++ code is linearly faster than Java code so that linear difference must also be there for power consumption. The differences in power consumption for memory are pretty small. We'd need some numbers to judge whether or not that's interesting and it'd also be interesting to see a JVM optimized for power consumption and how that factored in to the equation. The disk usage is probably the largest user of energy in these systems, presumably a C implementation and Java implementation would use the same disk layout so disk usage should be roughly the same.

The hyptertable guys think the performance differences are worth it to use C/C++ but they don't mention power consumption differences.

-----

1 point by wglb 272 days ago | link

"BTW: The correct unit of comparison would have been work/joule and not work/watt."

Well, work in this context seems to imply scaled computations/second and since watts is a per-second unit, work/watt would be the right measure.

-----

3 points by lars 273 days ago | link

His argument seems to be that the JIT somehow makes Java less efficient. That's plain wrong. Yes, running the JIT will take some processor juice, but you run it because you'll save processor time in the long run.

Yes, Java will probably use more watts than your C++ equivalent, but that's despite of the JIT, not because of it.

-----

3 points by barrettcolin 273 days ago | link

Maybe I'm missing something here: Java is bad because of "dirty little secret(s) of JIT technology". But doesn't LLVM IR need to be JIT-ed as well?

-----

2 points by wglb 272 days ago | link

Perhaps this is right. If it were done in C++, then the project might be later, and thus would save by delaying many node purchases, or even entire projects.

On the other hand, the author of the article does not factor into the equation calorie consumption of the programmers, nor the differential calorie consumption of Java programmers vs C++ programmers. Perhaps there is enough differential to close the gap.

-----

1 point by tezza 272 days ago | link

Easy targets have been picked as the base metrics.

This article reminds me of the genius who thought Google could save the world by making their homepage background black instead of white.[1]

The author should also consider how secure the JVM sandbox model is. What is the Time/Watt cost of securing the boxen if JVM is ditched ?

---

http://hardware.slashdot.org/article.pl?sid=07/07/27/054249

-----

1 point by babo 273 days ago | link

Anybody has experience with Hypertable what the author mentions?

-----

1 point by vicaya 272 days ago | link

Raise hand, I'm a Hypertable developer.

-----

1 point by kqr2 273 days ago | link

Ironically, google uses java (although dalvik is a modified jvm) for android whose target audience is definitely concerned with power.

Android also includes a lot of application hooks for power management.

-----

2 points by tezza 272 days ago | link

dalvik was introduced to work around various Java-on-mobile licensing restrictions.

kqr2 probably you know this, this quick reply is just for those other readers who don't know the specifics.

-----

1 point by antirez 273 days ago | link

Well google does not pay the bill for the users phone. Still I could like to see actual meters of power usage of java va c++ apps

-----

2 points by chibea 273 days ago | link

Right, google does not pay the bill, but that's not the point of android. It's meant as a platform for developers which then don't have to worry about CPU-specifics. And as important, it is meant as a platform for mobile device vendors. Though, google had to make sure that dalvik has a reasonable performance compared to native apps. Even if android Java apps are 2-10x slower than corresponding native applications, that's ok, because further savings will come with the next level of power-saving CPUs for mobile devices which will be able to run current android apps unchanged.

-----




Lists | RSS | Bookmarklet | Guidelines | FAQ | News News | Feature Requests | Y Combinator | Apply | Library

Analytics by Mixpanel