motorolnik's comments

motorolnik · 2024-10-26T08:48:01 1729932481

Hi, I've got few questions:

1). What profiling tools do you use for GPU code?

2). Where one would start, in terms of learning resources, about coding using inline GPU assembler?

3). Do you verify GPU assembler generated by a compiler from C/C++ code, in terms of effectiveness? If so, which tools do you use for that?

4). Is SIMD on GPUs a thing?

5). What are the primary factors being taken into account by you (cache sizes, microoptimizations, etc.) when you write code for a tool like gpuowl/prpll? Which factor is the most important? Thanks!

mpreda · 2024-10-26T10:38:02 1729939082

1. My profiling is rudimentary but effective. I measure per-kernel execution time with OpenCL events (which register with high accuracy start/end times w. practically no overhead), and also I continously measure per-iteration time by dividing wall-time for blocks of 20'000 iterations by that nb. These measuremens are consistent and sensitive.

2. I'm not aware of good learning resources. Explore existing such code, e.g. opencl miners tend to use asm. Read in amdgpu/ in LLVM. Disassemble code from OpenCL and read the ISA. Explore and experiment, but it's tedious. I would not recommend to jump into ISA initially. BTW AMD does have good GCN ISA docs available online, that is useful!

3. Yes I often read the compiled ISA, and over time I discover bugs and also better understand the ISA.

4. OpenCL is SIMD, and yes it matches the GPU HW.

5. most important is to reduce the number of registers used (#VGPRs), as that influences heavilly the occupancy of the kernel. Use fewer costly instructions such as FP64 mul/FMA. Sequential memory access, and in general reduce global memory access as it's very slow. Merge small kernels into one (keep the data in the kernel). Never spill VGPRs.

motorolnik · 2024-10-26T08:57:49 1729933069

And another more general question: (6) gcc, clang, and nvcc have some OpenMP offloading capabilities which allow to compile code into binaries which can then run on GPUs. Is the code they produce through OpenMP anywhere close to what one gets directly with i.e. opencl?

mpreda · 2024-10-26T10:52:38 1729939958

I don't know, I haven't eplored OpenMP myself.. maybe some day.

motorolnik · on Dec 19, 2019

Hi, Nick, can I ask you about that your 2017/2018 HFT project from another thread?