> Looks like this more about adapt existing CUDA code to work with AMD. So it ma...

raphlinus · on Nov 16, 2021

This doesn't sound right to me. Vulkan 1.2 has support for pointers through an extension[1], and that's getting more widely available (certainly a lot more cards than can run ROCm). There's also support coming down the pike for buffer device address[2], which basically lets you use pointers to refer to resources when submitting work to the GPU instead of having to create bindings.

If an open source project of this scope came to me for advice, I would absolutely 100% recommend they base their stuff on Vulkan. It's getting very good quite quickly; there's basically nothing holding you back any more if you're willing to use the latest extensions. There is a tendency in open source to move more slowly though, and I understand that.

I agree with you that GLSL/HLSL is a bad programming language, but you can still get your work done in it. I'm personally excited about rust-gpu, but that is not ready yet.

[1]: https://www.khronos.org/registry/vulkan/specs/1.2-extensions...

[2]: https://community.arm.com/arm-community-blogs/b/graphics-gam...

my123 · on Nov 16, 2021

These are not proper pointers, see: https://github.com/google/clspv/blob/master/docs/OpenCLCOnVu...

> Pointers are an abstract type - they have no known compile-time size. Using a pointer type as the argument to the sizeof() operator will result in an undefined value.

> Pointer-to-integer casts must not be used.

> Integer-to-pointer casts must not be used.

> Pointers must not be compared for equality or inequality

The first one in that list is a quite big limitation, affecting a significant number of scenarios…

(And yes, you cannot actually make a compliant OpenCL 1.2 implementation on top of Vulkan, Vulkan has too few features for that purpose)

raphlinus · on Nov 16, 2021

Thanks for the detailed explanation, that makes sense and is useful information. My understanding was that they were more like real pointers, but I see that is not the case.

boulos · on Nov 16, 2021

You'd recommend Vulkan Compute over "CUDA" though for a path tracer? (I don't assume Cycles has any hybrid rendering in it, so there isn't much value in the traditional pipeline)

raphlinus · on Nov 16, 2021

So it depends a lot on the goals. CUDA is a very good developer experience, and Vulkan compute shaders is (at the moment) a very bad one. But if the goal is to ship real compute on a wide variety of devices, Vulkan is pretty close to the only game in town.

I also expect the language support and tooling to get a lot better. The fact that it's based on SPIR-V means it doesn't lock you into any one language, it's possible to develop good tools that run on top of it. That's happening to a large extent in the machine learning space with IREE, much less so here though.

I should also say, I'm not disagreeing with the strategy Blender has chosen, specifically getting vendors to support tuned ports to their hardware. I'm just saying that these specific criticisms of Vulkan (like lacking pointers) aren't really valid.

pjmlp · on Nov 16, 2021

If I have learned anything from Khronos APIs is that if Internet was a Khronos standard instead of IETF, we would only have IP as standard, while everyone else had to come up with TCP, UDP, HTTP,... as extensions.

Language support and tooling are never better than the alternatives.

my123 · on Nov 16, 2021

Blender decided to support just the platforms which have a C++ target device language.

This encompasses CUDA, Metal[1] (one of the reasons why it’s much more usable than Vulkan), ROCm HIP, and oneAPI[2].

[1] Metal’s Shading Language is C++14 with a handful of limitations, the biggest one is no lambdas

[2] Vulkan uses a restricted SPIR-V dialect without pointers notably. OpenCL and oneAPI use a separate one which _does_ support them. However, AMD[3] and NVIDIA do not implement SPIR-V in their OpenCL drivers.

[3] it’s a trainwreck, they supported the original SPIR but then https://community.amd.com/t5/opencl/spir-support-in-new-driv... happened. It never came back since then.

Migrating back down to GLSL/HLSL from that makes absolutely no sense. The options above, using C++ as a device language, allow much more code sharing with a CPU backend.

Portability is also a good story, the adaptation is mostly in the glue layer, if your GPU vendor is _not_ AMD and has a proper software stack.

pjmlp · on Nov 16, 2021

Indeed, and a reason why Khronos only adopted SPIR-V and is now slowly embracing C++.

They took a hard beating from proprietary APIs that have long moved away from "C is the best" approach, and now they are playing catch-up with the rest of the world cozy in their mature C++ tooling for the GPGPU.

my123 · on Nov 16, 2021

Note that Microsoft had C++ AMP on that front, https://docs.microsoft.com/en-us/cpp/parallel/amp/cpp-amp-ov.... That worked on DirectX devices as a whole.

However, MS failed to capitalise on it and bailed out (too early). Has been limping along dead since years, finally acknowledged as deprecated in VS2022.

pjmlp · on Nov 16, 2021

I think that they rather decided to capitalize in HLSL, DirectX Compute, and mesh shaders instead, given the lack of love for C++AMP at ISO.

floatboth · on Nov 16, 2021

Why would anyone want pointers on the GPU?

DonHopkins · on Nov 16, 2021

That reminds me of the line from the HP technical support person from the "X-Windows Disaster" chapter of the "Unix-Haters" handbook:

https://donhopkins.medium.com/the-x-windows-disaster-128d398...

>My super 3D graphics, then, runs only on /dev/crt1, and X windows runs only on /dev/crt0. Of course, this means I cannot move my mouse over to the 3d graphics display, but as the HP technical support person said “Why would you ever need to point to something that you’ve drawn in 3D?”

>Of course, HP claims X has a mode which allows you to run X in the overlay planes and “see through” to the graphics planes underneath. But of course, after 3 months of calls to HP technical support, we agreed that that doesn’t actually work with my particular hardware configuration. You see, I have the top-of-the-line Turbo SRX model (not one, but two on a single workstation!), and they’ve only tested it on the simpler, less advanced configurations. When you’ve got a hip, forward-thinking software innovator like Hewlett-Packard, they think running X windows release 2 is pretty advanced.

WithinReason · on Nov 16, 2021

Except in this case the question makes sense, you don't have malloc on the GPU (it only allocates from a fixed array), and without dynamic memory allocation you could just use indices to an array any time you would use a pointer on the CPU. This also makes sense if you transfer your datastructure between GPU and CPU, since you don't need to translate pointers.

my123 · on Nov 16, 2021

You share the same pointers between CPU and GPU. Pointers in structures for example are one of the use cases.

Also as an even more unique option, see cudaHostRegister (on CUDA) to make a CPU buffer not allocated through CUDA accessible from the GPU without a copy, at the same address[1]. This isn’t especially high performance because you have to go through the PCIe bus (on Tegra, this mechanism is even more attractive), it’s however very useful.

Your address space is unified between CPU and GPU, with the same pointers used between both.

[1] at the same address for host allocations registered through that mechanism is only there on recent GPUs. Allocations done through CUDA still share the same address on both worlds on GPUs where this is unsupported.

DonHopkins · on Nov 16, 2021

Right, and you can simply use your finger to point at things you drew on your top-of-the-line Turbo SRX 3D display, instead of your cursor.

What does Brecht Van Lommel know about writing 3D graphics code and GPU kernels, anyway?

https://www.blenderdiplom.com/en/interviews/400-interview-br...

https://devtalk.blender.org/t/2021-08-31-blender-rendering-m...

So the hip, forward-thinking HP technical support person was right after all, decades ago! Makes me wonder why they're not still in business. Such nice hardware! Too bad about the software.

my123 · on Nov 16, 2021

Among other things: C++ on the GPU, with lots of common code between the two worlds.

It’s one of the core reasons why CUDA took off. You could adapt your code instead of a full rewrite, even mixing code within the same file. Types are checked at compile time too.

shmerl · on Nov 16, 2021

I'd take Rust over C++ for the GPU.

my123 · on Nov 16, 2021

I don’t see why you’d want that, but if you want to do it you can.

nvptx64-nvidia-cuda is a Tier 2 Rust target. (see https://doc.rust-lang.org/nightly/rustc/platform-support.htm...)

shmerl · on Nov 16, 2021

I'd ask why wouldn't you want that?

I think this is more promising: https://github.com/EmbarkStudios/rust-gpu

floatboth · on Nov 17, 2021

…and this one is targeting SPIR-V for Vulkan, not some nvptx-stuff :)

MrLeap · on Nov 16, 2021

It would be nice if I could use function pointers in hlsl. It would let me pass a pointers to distance functions around. Doing anything complicated with signed distance fields is a bit hairbrained without them.