The real issue IMO was the split-source. CUDA's single source or HCC's single-source means all your structs and classes work between the GPU and CPU.
If you have a complex datastructure to pass data between the CPU and GPU, you can share all your code on CUDA (or AMD's HCC). But in OpenCL, you have to write a C / C++ / Python version of it, and then rewrite an OpenCL C version of it.
OpenCL C is driven by this interpreter / runtime compiler thingy, which just causes issues in practice. The compiler is embedded into the device driver.
Since AMD's OpenCL compiler is buggy, this means that different versions of AMD's drivers will segfault on different sets of code. As in, your single OpenCL program may work on 19.1.1 AMD Drivers, but it may segfault on version 18.7.2.
The single-source compile-ahead-of-time methodology means that compiler bugs stay in developer land. IIRC, NVidia CUDA also had some bugs, but you can just rewrite your code to handle it (or upgrade your developer's compilers when the fix becomes available).
That's simply not possible with OpenCL's model.