Is the PTX that everyone was looking forward to included this time?

find0x90 · 2025-02-25T04:27:53 1740457673

Yes, there's some in the csrc/kernels directory. Search for 'asm' to find uses of it.

swyx · 2025-02-25T05:07:23 1740460043

> the PTX that everyone was looking forward to

explanation for the rest of us why this is so important?

ta988 · 2025-02-25T06:42:21 1740465741

Parallel Thread Execution. Think of them as opcodes for the Nvidia GPUs. They are a bit more complex that your traditional opcodes (the lowest level of abstraction accessible to users) in CPUs, as you can specify cache parameters, memory barriers etc.

There are documented combinations of parameters for those instructions but if you fuzz (search new combinations in a random or organized way because you hope some will work the way you want) you can find new ones with unexpected effects or with advantages (in various ways like not polluting caches, speed...)

Which is the case for example for ld.global.nc.L1::no_allocate.L2::256B that they use in deepseek that provides significant acceleration while beeing reliable (although not working on all architectures so they have ways to disable it)

rfoo · 2025-02-25T07:56:22 1740470182

Gonna check what SASS it get translated to and whether it makes any sense.

I wonder if they had SASS assembler for Hopper (either by reverse engineering nvdisasm or by fuzzing instructions + nvdisasm + stare hard) and don't want to say it out :p

saagarjha · 2025-02-25T10:36:45 1740479805

You'd be looking at ptxas here. FWIW, it looks like it generates LDG.E.NA.LTC256B.U8.CONSTANT on my machine.

saagarjha · 2025-02-25T10:08:09 1740478089

CPUs have instructions with similar semantics.

find0x90 · 2025-02-25T05:49:48 1740462588

Much of the hype around DeepSeek is due to their extraordinarily low training and inference costs. They achieved this by optimizing their training code, apparently using PTX in addition to CUDA. PTX is kind of an intermediate assembly language for NVIDIA GPUs and people are eager to see how it was used.