Thanks, I hadn't heard of that and I will look into it. This is a research setting with plenty of hardware we can request and not a huge number of users so that part doesn't worry me.
> This is a research setting with plenty of hardware we can request and not a huge number of users
If you don’t care about cost of ownership, use CUDA. It only runs on nVidia GPUs, but the API is nice. I like it better than vendor-agnostic equivalents like DirectCompute, OpenCL, or Vulkan Compute.