I can't help but wonder how much you could do with a super computer budget worki...

pmjordan · on June 14, 2009

The problem with GPUs is that I/O latency is very high compared to your average supercomputer. You can do a crazy amount of computation locally on one card, but for problems that aren't "embarrassingly parallel", i.e. those that require a lot of low-latency inter-node communication, you'll immediately be limited by latency.

If nVidia or AMD release GPU based stream processors with onboard or daughterboard-based interconnects directly accessible from the code running on the GPU, THAT's when they'll start eating into CPU market share.

If you're buying a supercomputer, you'll want to make sure to spend at least 50% on the interconnect or you're in for a big disappointment.

bravura · on June 14, 2009

How would the NVidia GPU personal supercomputer do on large matrix-matrix multiplication?

pmjordan · on June 15, 2009

Depends how large. If it fits in video memory, very well. If not, pretty badly.

Retric · on June 14, 2009

Super computers are already focused on "embarrassingly parallel" problems. Otherwise 300,000 cores is not going to do much for you anyway. However, I agree that interconnect speed would be a major issue for many supper computer workloads. Yet, I suspect if you had access to a 10+million$ supercomputer built using 1million GPU cores plenty of people would love to work with such a beast.

gaius · on June 15, 2009

No, these are not just racks and racks of individual machines. It presents the programmer with a single system image - it "looks" like one huge expanse of memory.

timf · on June 15, 2009

We have a Blue Gene at Argonne, it's not SSI. It is however not designed for embarrasingly parallel workloads, you use libraries like MPI to run tightly coupled message passing applications (which are very sensitive to latency). You can, and people have, run many-task type applications too.

Retric · on June 15, 2009

The basic speed of light limitation means that accessing distant nodes is going to have high latency even if there is reasonable bandwidth. Ignoring that is a bad idea from an efficiency standpoint. And, unlike PC programming the cost of the machine makes people far more focused on optimizing their code for the architecture than abstracting the architecture to help the developer out.

gaius · on June 15, 2009

Yes, the plumbing takes care of that for you. Oracle does similar tricks if you run it on NUMA hardware.

pmjordan · on June 15, 2009

It take care of it to some extent, but you still have to be aware of it as the programmer. MPI and associated infrastructure are set up such that they'll pick the right nodes to keep the network topology and your code's topology well matched. But you have to do your best as a programmer to hide the latency by spending that time doing other things.