And they are doing it again with Vulkan. With a large majority looking for higher level wrappers instead of dealing directly with its APIs and increasing number of extensions.
This assertion makes no sense. The whole reason why these APIs are specified based on C is that once a C API is available its trivial to develop bindings in any conceivable language. You can't do that if you opt for a flavor of the month.
Furthermore, GPGPU applications are performance-driven, and C is unbeatable in this domain. Thus by providing a standard API in C and by enabling standard implementations to be implemented in C you avoid forcing end-users to pay a performance tax just because someone had an irrational fear of C.
One of the reasons CUDA beat OpenCL was that enough people preferred C++ or Fortran to C and CUDA was happy to accommodate them while OpenCL wasn't.
Nonsense. We're talking about an API. C provides a bare-bones interface that doesn't impose any performance penalty, but just because the API is C nothing forces anyone to implement core functionality in the best tool they can find.
That's like complaining that an air conditioner doesn't cool the room as well as others just because it uses a type F electrical socket instead of a type C plug.
I don't think we are. We are talking about languages.
In CUDA, compute kernels are written in a language called "CUDA C++." In OpenCL 1.2, compute kernels are written in a language called "OpenCL C." Somebody else probably could have (likely did) implement a compiler for their own version of C++ for OpenCL kernels, but the point 'pjmlp was making is that the standard platform did not enable C++ to be used for kernels until long after it was available in CUDA.
NVidia that created a C++ binding to use Vulkan instead of plain C or AMD that had to create higher level bindings for the CAD industry to even bother to look into Vulkan.
This blind advocacy for straight C APIs will turn Vulkan into another OpenCL.
CUDA is clearly performance driven, and is a more mature C++ model.
Template functions are a type-safe way to build different (but similar) compute kernels. Its far easier to use C++ Templates Constexpr and whatever to generate constant-code than to use C-based macros.
In practice, CUDA C++ beats OpenCL C in performance. There's a reason why it is so popular, despite being proprietary and locked down to one company.
The real issue IMO was the split-source. CUDA's single source or HCC's single-source means all your structs and classes work between the GPU and CPU.
If you have a complex datastructure to pass data between the CPU and GPU, you can share all your code on CUDA (or AMD's HCC). But in OpenCL, you have to write a C / C++ / Python version of it, and then rewrite an OpenCL C version of it.
OpenCL C is driven by this interpreter / runtime compiler thingy, which just causes issues in practice. The compiler is embedded into the device driver.
Since AMD's OpenCL compiler is buggy, this means that different versions of AMD's drivers will segfault on different sets of code. As in, your single OpenCL program may work on 19.1.1 AMD Drivers, but it may segfault on version 18.7.2.
The single-source compile-ahead-of-time methodology means that compiler bugs stay in developer land. IIRC, NVidia CUDA also had some bugs, but you can just rewrite your code to handle it (or upgrade your developer's compilers when the fix becomes available).
That's simply not possible with OpenCL's model.