AMD releases APPML source code, creates clMath library

chrisballinger · on Nov 26, 2013

Even though these libraries are open source, it cannot be built without their proprietary, binary-only "AMD APP SDK" [1], only available for Linux and Windows. Bummer.

1. http://developer.amd.com/tools-and-sdks/heterogeneous-comput...

jhartmann · on Nov 26, 2013

Actually it needs any OpenCL SDK, not AMD APP SDK specifically. If you were building for an Altera FPGA or an NVidia card you would use their OpenCL SDK. OpenCL is typically a LLVM frontend and backend for the hardware in question (or something similar). What they are releasing is in fact all the source for their FFT and BLAS implementations. They aren't opening up the compiler here, just the software that runs on top of it. The compiler itself probably has lots of stuff in it they consider competitive advantage so it will probably be a little while before they open that.

bloodorange · on Nov 26, 2013

I concede but this is a start and AMD has shown a lot more inclination towards 'open' in the recent period. I think Raja Koduri joining them again is really good for the industry as a whole, not just for AMD.

Of late, I was thinking that AMD has not been doing well in the CPU division and now I feel that there is hope and we may see them pull off another Athlon...

jhartmann · on Nov 26, 2013

I noticed this today as I was searching around for some opencl image routines, and thought it was of general interest to the community. I really think this is an awesome thing, and the availability of a high performance open source BLAS that can be compiled to a wide array of OpenCL capable hardware is just great news.

albertzeyer · on Nov 26, 2013

I have recently started using ViennaCL (http://viennacl.sourceforge.net/). It has an Boost uBLAS-like interface and has backends for OpenMP, OpenCL, CUDA and uBLAS.

It is in very active development and the community is very nice and helpful.

I think there also doesn't exists something similar, i.e. a lib which can easily do the calculations on both the CPU and the GPU (via OpenCL).

oneofthose · on Nov 26, 2013

There is also VexCL [0] and Boost.Compute [1]. Both are quite capable. I myself am working on something similar - it is called Aura but still a long way to go [2]. ViennaCL, VexCL and Boost.Compute focus on maximizing programmer productivity. Converting existing code from i.e. Matlab to accelerator hardware is trivial using these libraries and you get excellent performance quickly. Furthermore, NT2 doesn't get nearly as much publicity as it deserves [3]. Not only does it provide an incredible number of functions ready to use, but it also exploits the vector processing capabilities of modern CPUs. These capabilities are too often ignored by developers or left for non-optimal compiler optimizations to exploit.

In my own library Aura, I focus on maximizing performance (over developer convenience). The target audience are developers of real-time applications that need every last drop of performance from their hardware, while still maintaining a sane and cross-platform API. I strive for a Boost.Asio for accelerator developers. Aura has a already a rudimentary wrapper for clFFT, clBLAS is in the works. So the idea is to, for each platform, utilize optimal vendor-supplied library functions and combine them in a coherent interface.

[0] https://github.com/kylelutz/compute

[1] https://github.com/ddemidov/vexcl

[2] https://github.com/sschaetz/aura

[3] https://github.com/MetaScale/nt2

albertzeyer · on Nov 26, 2013

Thanks a lot for the links! Btw., the VexCL author is also active in the ViennaCL community.

Atm., I want to re-implement some of the functionality of Theano but in C++. Esp. I also want to implement deep neural networks. And I want my code in a way so that I can easily switch between different calculation backends later on, like using the CPU or multiple CPUs (hopefully with vector processing), or some GPUs (e.g. via OpenCL) or even some multiple-machine cluster.

I found many libs doing one of this very well but only very few libs which supports multiple backends like ViennaCL. For example, Boost.Compute only supports OpenCL but not the CPU. VexCL, as far as I understand, also does not support CPU calculations.

In what state is Aura? And would it be a good fit for my needs?

oneofthose · on Nov 27, 2013

I didn't know about Theano, thanks for the hint. I have been focused on C++ libraries only, apparently missing a huge body of work.

CPU is supported through OpenCL by these libraries. Remember, OpenCL code can run on CPUs. As for Aura, it is pre-alpha, not a lot of functionality there yet, I'm still figuring out the interface. So not usable yet, but keep an eye on it, it will be.

albertzeyer · on Nov 27, 2013

Oh, I wasn't aware that OpenCL also can run on the CPU, I always thought that it is GPU-only, thanks for pointing that out! How fast is that? Is that comparable to uBLAS or ATLAS or so? Can that scale to multiple CPU cores? Does it use SSE or similar technics? Or does that depend on the implementation? What can I expect in common desktop PCs?

oneofthose · on Nov 28, 2013

I'm not overly familiar with OpenCL on the CPU myself, I only know it works. But people are doing this [0,1]. Whether or no it uses SSE/multiple cores depends on the specific OpenCL backend that is used. But it should use both multiple cores and SSE. I know Intel does this for both their Phi and regular CPUs.

For me, the most important thing in both CUDA and OpenCL is the programming model. It allows us to describe data parallel problems and related data (in)dependence explicitly. Compilers should be (and already are) able to generate efficient code from this. It is not as nice as it could be, we have to write kernels by hand etc. But there are libraries that make our lives easier (we discussed them earlier). And there is also C++ AMP which tries to integrate better. Yet still, while we have all these options and we can solve most of our problems with more or less effort and elegance, I believe there must be something better out there: the right way to describe data parallel and task parallel problems as well as concurrency etc. Maybe the FP guys are on to something, I don't know. I'll be on the lookout.

[0] http://www.pds.ewi.tudelft.nl/fileadmin/pds/homepages/shenji... [1] http://comparch.gatech.edu/hparch/papers/lee_plc2013.pdf

foxhill · on Nov 26, 2013

as cool as this is, it was dated august.

it's still (practically) impossible to do a proper LINPACK benchmark with open source tools on AMD GPUs, although this is a step in the right direction, and more importantly, a big blow to CUDA.

ykl · on Nov 26, 2013

How exactly is this a blow to CUDA though? NVIDIA has been shipping CUDA versions of BLAS and FFT (see CUBLAS and CUFFT) for years now.

jhartmann · on Nov 26, 2013

While CUBLAS is available out there, it is not something that can be run on such a wide array of hardware. This is probably of great interest to startups wanting to do things on things on FPGA's and mobile devices. They now have a well optimized open source math library to use as potential building blocks, and it has an Apache License. CUBLAS is only good for NVidia cards. So I actually think this could be a big deal, and could potentially reduce the popularity of CUDA over time.

jjoonathan · on Nov 26, 2013

I sure hope it does. I have paid entirely too much for my predecessor's CUDA lessons through the premium NV charges for their cards.

Unfortunately, I'm pretty sure NV is entrenched at this point. My AMD card and my resolution to port everything I needed to use lasted about 3 months, and that was without any CUBLAS dependencies :(

maaku · on Nov 26, 2013

Because there's now more competition?