CUDA is a bit of a well-trodden ground, you aren’t going to do much better there (if at all) than cuBLAS and cuDNN. But I get what you’re saying, gotta pick one’s battles.
My understanding is it's less about competing with cuBLAS and cuDNN directly but rather offering the features they expose in a better and more idiomatic way - there's a reason it's less fun and more tedious to write C++ AMP code.