Hello, Windows 10 user here. How can I be sure I'm using optimized atlas? I tried building it earlier, but with no success. I found some info on atlas forum for older versions, but couldn't get it to work.
Easy: if it works, it uses the ATLAS version that is installed on the system, which means that when it works, it will use the optimized ATLAS that you provided.
That means that you have to build it itself, which is problematic on Windows, BUT, the author recently released version 3.10.3, and one of major improvements is that builds fine on Windows with cygwin and mingw.
clBLAS is, in my opinion, hard to build AND hard to integrate.
On top of it, this approach gives better performance in most cases even on AMD, and especially on Nvidia. Now, I have AMD hardware, but it is better to create an overall more encompassing library, thus I avoided clBLAS :)
When I need to write my own OpenCL kernels, I use ClojureCL - it gives me easy management while still retaining full control of the kernels and their performance.
I found the latest version of clBLAS on Fiji achieves a fantastic ~4 Tflops (on 2^n matrices). NVblas has probably had more resources allocated to it that clBLAS. I'd be positively surprised if the kernels in Neantherdal beat those. Do you plan on adding benchmarks for the GPU calls ? I can help running the clBLAS benchmarks, if you like, since I have a tuned setup.
If you want to take a look at it, the AutoGemm generator seems to be a simple python script written in order to overcome the limitations of the C preprocessor. I was considering using its tiling structure, since I already have a Lisp->OpenCL compiler in place (and have had no luck beating it). See, for instance,
I think it's about 5.6 Tflops. Wow, 3.75 Tflops on Hawaii is very good indeed; I agree this is not something that clBLAS would beat by a wide-margin if at all.
Of course. Sparse operations are on the TO DO list. I would have already added them if I needed them, so there are two options:
1) Wait until I need them.
1a) Become active in the community and bug me often enough that I realize how important it is :)
2) Contribute sparse library integrations (I'll help).