I’m not saying that you cannot write vector code, but that it’s typically a special case. CUDA APIs and annotations are bolted onto existing languages rather than reflecting languages with vector operations as natural first class operations.
C or Java have no concept of `a + b` being a vector operation the way a language like, say, APL does. You can come closer in C++, but in the end the memory model of C and C++ hobbles you. FORTRAN is better in this regard.
It is always possible to inline assembler in C, and present vector operators as functions in a library.
Otherwise, R does perceive vectors, so another language that performs well might be a better choice. Julia comes to mind, but I have little familiarity with it.
With Java, linking the JRE via JNI would be an (ugly) option.
C or Java have no concept of `a + b` being a vector operation the way a language like, say, APL does. You can come closer in C++, but in the end the memory model of C and C++ hobbles you. FORTRAN is better in this regard.