CUDA is clearly performance driven, and is a more mature C++ model.
Template functions are a type-safe way to build different (but similar) compute kernels. Its far easier to use C++ Templates Constexpr and whatever to generate constant-code than to use C-based macros.
In practice, CUDA C++ beats OpenCL C in performance. There's a reason why it is so popular, despite being proprietary and locked down to one company.