

Implementing Level-3 BLAS Routines in OpenCL on Different Processing Units - redknight666
http://hgpu.org/?p=12996

======
stephencanon
Title appears incorrect; from what I can see there's no claim of this work
being faster than cuBLAS in the article. There are some claimed speedups
relative to clBLAS [note: "cl" not "cu"], and some references to other work
which claims speedups over cuBLAS.

~~~
mdda
Agreed - the title should be changed. Perhaps the submitter saw the Nvidia
cards in there and read this as an OpenCL vs CUDA kind of thing, not seeing
the principal cool-factor of OpenCL : that OpenCL interoperates across all
these devices (GPUs and CPUs - despite Nvidia making it difficult to find in
their SDK)

