Hacker News new | past | comments | ask | show | jobs | submit login
Open Source GPGPU (github.com)
95 points by luu on June 1, 2014 | hide | past | web | favorite | 28 comments

Terminology is getting confusing. I thought gpgpu was a term of programming a graphics card to do gp things.

It's basically a GPU without the graphics-specific hardware.

I agree the term seems a bit strange, but people seem to know what it means (Larrabee was often referred to as a GPGPU http://en.wikipedia.org/wiki/Larrabee_(microarchitecture) ). I couldn't think of another term to call it. People also object sometimes to the term "IP Core" on similar grounds, but everyone knows what it means.

Very nice, I'll have to see if I can get this running on my Zedboard.

Or parallella.

Where is this GPGPU used (or going to be used)?

It's a hobby project, mostly to learn more about hardware and computer architecture. I've done a few performance investigations and documented them here: http://latchup.blogspot.com/

You should do some more posts about the Lisp and Many Core cpus you wrote.

In your imagination, where several layers of unneeded abstraction actually accelerate processing.

I wonder if this is competitive with the Intel graphics on their cpus. Intel gpus are not exactly stellar.

Nothing running on an FPGA that implements a GPU architecture / instruction set will be competitive with any recent Intel graphics.

Paraphrasing Winston Churchill, "However beautiful the theory, you should occasionally look at the evidence."

Why not?

Due to the overhead of configurability, FPGAs are something like 10x less area-efficient and power-efficient than ASICs.

also, intel have poured millions of dollars into architecture research, which, as impressive as this work is, probably has not had the same amount of R&D.

Intel graphics has always lagged behind Nvidia and ATI in terms of performance and efficiency for the longest time,despite having a large budget. One of the reasons that the Intel graphics has been able to make spectacular gains in performance and efficiency year over year recently is because of how bad their gpu was. You'll never see those kinds of gains from highly optimized chips.

It doesn't seem he has tried taping this out as an ASIC yet. Unless you fab the circuit in the same process as Intel uses (which you won't be able to do unless you work for Intel), any performance comparisons would be pretty meaningless.

One just needs a wafer scale FPGA.

I suspect that if you asked a microelectronics engineer about that he would disagree. An FPGA has too many things not needed; there will be leakage and other nasty effects constraining frequency. And wafer scale? Signals need time to move. You can't just scale things up like that. Those are my hunches anyway - we'd need a specialist to say more.

GPUs run at a pretty low clock rate anyway (600Mhz-1100Mhz), should be able to get 300-400Mhz out of an FPGA, the main issue is interfacing with the GDDR memory, if you can.

There were some pushes for wafer scale stuff in the 80s [0], I think we are better suited from an algorithms, EDA and architecture standpoint now to actually make it work. A GPU is actually pretty good test bed and eventually GPU and FPGA functionality will merge into a single programmable compute fabric.

Yield on a wafer scale FPGA could actually be much better than a special purpose chip. The faulty logic elements / LUTs could just be marked as bad and not used.

[0] http://en.wikipedia.org/wiki/Wafer-scale_integration

> GPUs run at a pretty low clock rate anyway (600Mhz-1100Mhz), should be able to get 300-400Mhz out of an FPGA

An ASIC (GPUs are ASICs) running at 300 MHz can do a lot more per cycle than an FPGA at the same technology node running at the same frequency. A lot. Think order-of-magnitude more.

> The faulty logic elements / LUTs could just be marked as bad and not used.

This will screw up your timing, unless you reserve more setup slack (which in turn hurts the achievable performance).

I've got memories of a lab actually doing a wafer scale FPGA in the mid 90's. The reference escapes me, but if I can find it I'll post it. As you suggested, they preserved the yield by routing around defects.


Edit: Or it might have been a PGA, rather than an FPGA.

Yes, there would be huge practicality issues with a "wafer-scale" FPGA -- aside from propagation delays, as you note, the chip would have horrible manufacturing yield, because any defect anywhere on the wafer would produce a defective chip. (In contrast, when a wafer has a bunch of tiny chips, a given defect only takes out one small chip, and you still have the other N-1 to use.) Ultimately this drives up cost to the end-user.

Fab equipment also has limits in how big a single chip can be due to "reticle limits", i.e., the size of a single mask exposure, and a chip has to be a single exposure (per layer).

The CAD software (place+route) might actually be a limiter here too. Even if, in theory, you had an FPGA with enough logic blocks to match a modern GPU's raw gate count (i.e., circuit size), an FPGA is just a sea of identical configurable blocks and parts of the circuit are assigned to the blocks and connected with wires using something like simulated annealing. This is why FPGA place+route takes so long. In contrast, a custom chip can put exactly the right gates and wires at exactly the right places, and especially for something like a GPU or a large cache where there are lots of repeated elements, this human involvement makes things much more tractable. (It's also why e.g. the caches on a modern CPU die look so pretty, like a big grid of farms or something -- an FPGA-synthesized CPU cache would look nothing like that, and take much more space!)

All of that said -- from an academic point of view, this GPGPU core is an impressive piece of work, especially given that it includes an LLVM backend!


FPGA's for ASIC simulation of larger chips are typically arrays of FPGAs rather than single chips.

You get stuff like this:


I'm not aware of anybody shipping a wafer scale FPGA.

It was an ambitious just.

A like 'mere' in 'a mere matter of programming' :)

Now I get it. Sorry for the misunderstanding, appreciate the expansion.

wow, this is nothing short of incredible.

i wonder if an OpenCL stack can be made..

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact