Hacker News new | comments | show | ask | jobs | submit login
Oracle proposes Java GPU support (java.net)
73 points by Mitt 1862 days ago | hide | past | web | 16 comments | favorite

How is this different then:


Discussed here a couple days ago?


Native JVM rather than via JNI so you get the sandboxing and safety guarantees of the JVM. Standardized...

Standard coauthored by AMD/ATI.

AMD/ATI has a clear path to fifth-generation computer architecture. Intel doesn't.

Having x86 or ARM CPU cores sharing the same die and memory space as streaming processing cores is a straight path towards a "perception engine" that will power up machine learning, text analysis, machine vision and simlar tasks 100 times.

I'm hoping for high end server parts that will help me build generative and discriminative models, but the current fusion "low-end" parts for smart phones, tablets and cheap laptops use 1/5 the power consumption of conventional architecture while running tasks suitable for GPU -- for instance, video encoding and decoding.

And if we get it all together, you'll be able to run the models I built on a big server to deliver intelligent system capabilities to mobile and desktop apps.

Have you looked at AMD's G-series embedded parts? Now imagine the APUs there (vector units) alongside a couple ARM cores.

I'm not really sure what advantage ARM cores would give, considering that the added complexity is quite significant.

Citation needed on the 1/5 power consumption thing. GDDR5 isn't that power hungry, most modern GPUs can basically turn themselves off, and PCIe isn't that power hungry either.

Anyway, Intel's in better shape than you think and in a significantly better position than AMD, especially after recent acquisitions. The big reason is that for anything above low-end, you don't want to share a die.

Hypothetical architecture: put a future Xeon Phi chip (descendant of Larrabee, formerly Knights Corner, etc) on QPI. Have its on-die memory controller be based on GDDR5 (or a GDDR5 successor--low capacity, high latency, high bandwidth). Put another standard Xeon next to it, with SDDR4 or the SDDR4 successor on its memory controller (high capacity, low latency, relatively low bandwidth). Now maybe put an InfiniBand chip from the QLogic acquisition on there, maybe some fast path to an Intel SSD as well, and voila, you've got an all-Intel HPC node with shared CPU/vector processor address space, and you don't even need PCIe.

The idea of a combined CPU/GPU for servers with high performance on either the CPU or the GPU side is a pipe dream for a few reasons:

1. The big driver of GPU performance isn't FLOPs, it's bandwidth. Most of the applications out there on GPUs today are bandwidth limited, not FLOP limited. In other words, the max performance gain you're looking at from a GPU port is on the order of the bandwidth boost, which has been ~2.5x per socket since Nehalem or so.

2. The reason GPUs can get so much bandwidth is because they throw everything else under the bus in the quest for bandwidth--GPU memory latency is an order of magnitude or two higher than CPU memory latency, capacity is painfully limited, everything's soldered down, etc. (The reason why they do this is that sufficiently data-parallel applications can get away with high latency and that GPUs can therefore be big latency-hiding machines.)

3. If you try to use memory with the wrong characteristics for a given processor, you're basically going to cripple that processor. A GPU with 128GB of memory would be cool, but it would provide no benefit for most apps, even with a very fast interconnect between the CPU and GPU. A CPU with 12 or 24GB of GDDR5 would perform terribly due to the inability to hide memory latency and be a complete joke on the marketplace. Building both also doesn't really work due to fab constraints.

So really, for anything where you're intending to use the GPU/data parallel side as an integrated accelerator rather than an endpoint (that displays graphics to the screen), you want two dies. Intel's in very, very good shape there. (In mobile/low end, you can get away with slower memory for the GPU because you can rely on shared L2/L3 cache to make up for a lot of the perf loss. That is significantly less acceptable for big GPUs dealing with much bigger datasets.)

Gary Frost, the AMD person involved, was also the lead on aparapi, which seems to be in the same vein:


It would probably be great if the GPU support wasn't much more than something like Rootbeer but directly embedded in the JVM. It isn't clear to me that they will be able to identify things to run on the GPU automatically.

It could be possible to add language features that explicitly leverage the parallelism you usually find in GPUs (and CPU vector units).

if I do a callableThing.map(iterableThing), I may not always care whether the callable does its magic sequentially on the elements of iterable. And, unless I explicitly use the resulting iterable, I don't care about the order the results are returned.

While things like http://lwjgl.org/ are great.

It would be fantastic for multiple reasons for this to be part of the JVM. Would make packaging code that uses the GPU simpler for cross system releases.

Could bring more desktop/game projects to Java.

As far as I can tell, this is for running JVM code on the GPU (CUDA, etc.), not graphics.

games are not only about graphics, at least not all of them lol

This sounds like a great addition and could make games like Minecraft run way better across machines

This isn't about hardware accelerated graphics for Java; that's already available.

Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact