

Code Optimization Techniques for Graphics Processing Units - ColinWright
http://hgpu.org/?p=6614

======
thebigredjay
Link to source, which includes slides and examples:
<http://homepages.dcc.ufmg.br/~fpereira/classes/gpuOpt/>

You might also be interested in the work of a prof at the University of
Alberta, Jose Nelson Amaral.

A Complete Descritpion of the UnPython and Jit4GPU Framework
[https://www.cs.ualberta.ca/system/files/tech_report/2011/Gar...](https://www.cs.ualberta.ca/system/files/tech_report/2011/GargAmaralGPU-
TR.pdf)

Jit4OpenCL: A Compiler from Python to OpenCL
<http://webdocs.cs.ualberta.ca/~amaral/thesis/XunhaoLiMSc.pdf>

------
6ren
Would it be economical to manufacture a GPU with 100's, 1,000's or even
10,000's of processing elements, but with much lower clock (say, 100MHz)?

Cores are physically quite small; a lower clock rate reduces power issues) and
I suspect that it would also increase yeild rates (perhaps by using a thicker
structures with a finer process, e.g. 45nm on a 32nm process).

One barrier may be that practitioners have few techniques for such massive
parallelism (catch 22). OTOH, it seems certain than manufacturers would have
done their sums, and worked out they can deliver greater performance with
their present number/clock tradeoff.

~~~
wtallis
Yield problems hurt superlinearly as you scale up the size of a chip. That's
why the fastest graphics cards have two GPUs on board, and why two mid-range
cards will usually offer a better price/performance ratio than the biggest,
fastest single-GPU card. Besides, the biggest GPUs out there are already
little more than arrays of hundreds or thousands of vector processors: AMD's
current biggest is 2.6B transistors divided among 1536 shader processors
running at 800-900 Mhz, and NVidia's biggest is 3B transistors divided among
512 shader processors running at 1.5Ghz.

That NVidia monster has a die size of about 520 mm^2. At that size, there's
already a lot of waste due to the fact that wafers are round and the chips are
rectangular, and that can only be reduced by making physically smaller chips.
(Rumor has it that by the time NVidia's 529mm^2 GF100 chip was originally
supposed to launch, yields were bad enough that they were getting only about 2
usable chips per 300mm wafer. The cost of chip production has a pretty much
linear relationship with the number of wafers processed, so that _really_
hurt.)

