> This is in part because of the work by Google on the NVPTX LLVM back-end.
I'm one of the maintainers at Google of the LLVM NVPTX backend. Happy to answer questions about it.
As background, Nvidia's CUDA ("CUDA C++?") compiler, nvcc, uses a fork of LLVM as its backend. Clang can also compile CUDA code, using regular upstream LLVM as its backend. The relevant backend in LLVM was originally contributed by nvidia, but these days the team I'm on at Google is the main contributor.
I don't know much (okay, anything) about Julia except what I read in this blog post, but the dynamic specialization looks a lot like XLA, a JIT backend for TensorFlow that I work on. So that's cool; I'm happy to see this work.
Full debug information is not supported by the LLVM NVPTX back-end yet, so cuda-gdb will not work yet.
We'd love help with this. :)
Bounds-checked arrays are not supported yet, due to a bug [1] in the NVIDIA PTX compiler. [0]
We ran into what appears to be the same issue [2] about a year and a half ago. nvidia is well aware of the issue, but I don't expect a fix except by upgrading to Volta hardware.
Does this mean we could hook Cython up to NVPTX as the backend?
I've always thought it weird that I'm writing all my code in this language that compiles to C++, with semantics for any type declaration etc...And then I write chunks of code in strings, like an animal.
IDK about Cython, but I remember a blog post using Python's AST reflection to jit to LLVM ->NVPTX -> PTX. It's relatively simple to do, I've done it for LDC/D/DCompute[1,2,3]. It's a little tricker if you want to be able to express shared memory surfaces & textures, but it should still be doable.
I spend a while looking at debug information for NVPTX last year and came to the conclusion that it luckily dwarf, with some weird serialisation for the assembler.
The NVPTX backend would benefit imo to move towards the more general LLVM infrastructure so that emitting the dwarf info is not another special case.
We'd like this too. Unfortunately a lot of the special cases can't be eliminated because we have to interface with ptxas, the closed-source PTX -> SASS (GPU machine code) optimizing assembler.
Yeah I know and the DWARF info special cases are even worse for ptxas. I never had enough time, but Nvidia has surprisingly a lot information on it out there.
nvcc installed on a Mac seems tied to the current clang and the latest clang's don't support CUDA development so I have to retrograde my clang to an older version to use CUDA. Why is nvcc tied to clang?
nvcc installed on a Mac seems tied to the current clang and the latest clang's don't support CUDA development so I have to retrograde my clang to an older version to use CUDA. Why is nvcc tied to clang?
To be clear, there are two ways to compile CUDA (C++) code. You can either use nvcc (which itself may use clang), or you can use regular, vanilla clang, without ever involving nvcc.
Nvidia's closed-source compiler, nvcc, uses your host (i.e. CPU) compiler (gcc or clang) because it transforms your input .cu file into two files, one of which it compiles for the GPU (using a program called cicc), and the other of which it compiles for the CPU using the host compiler.
The other way to do it is to use regular open-source clang without ever involving nvcc. The version of clang that comes with your xcode may not be new enough (I dunno), but the LLVM 5.0 release should be plenty new, unless you want to target CUDA 9, in which case you'll need to build from head.
I don't know the technical reasons why nvcc is so closely tied to the host compiler version -- it annoys me sometimes, too.
I'm one of the maintainers at Google of the LLVM NVPTX backend. Happy to answer questions about it.
As background, Nvidia's CUDA ("CUDA C++?") compiler, nvcc, uses a fork of LLVM as its backend. Clang can also compile CUDA code, using regular upstream LLVM as its backend. The relevant backend in LLVM was originally contributed by nvidia, but these days the team I'm on at Google is the main contributor.
I don't know much (okay, anything) about Julia except what I read in this blog post, but the dynamic specialization looks a lot like XLA, a JIT backend for TensorFlow that I work on. So that's cool; I'm happy to see this work.
Full debug information is not supported by the LLVM NVPTX back-end yet, so cuda-gdb will not work yet.
We'd love help with this. :)
Bounds-checked arrays are not supported yet, due to a bug [1] in the NVIDIA PTX compiler. [0]
We ran into what appears to be the same issue [2] about a year and a half ago. nvidia is well aware of the issue, but I don't expect a fix except by upgrading to Volta hardware.
[0] https://julialang.org/blog/2017/03/cudanative [1] https://github.com/JuliaGPU/CUDAnative.jl/issues/4 [2] https://bugs.llvm.org/show_bug.cgi?id=27738