Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is the PTX that everyone was looking forward to included this time?


Yes, there's some in the csrc/kernels directory. Search for 'asm' to find uses of it.


> the PTX that everyone was looking forward to

explanation for the rest of us why this is so important?


Parallel Thread Execution. Think of them as opcodes for the Nvidia GPUs. They are a bit more complex that your traditional opcodes (the lowest level of abstraction accessible to users) in CPUs, as you can specify cache parameters, memory barriers etc.

There are documented combinations of parameters for those instructions but if you fuzz (search new combinations in a random or organized way because you hope some will work the way you want) you can find new ones with unexpected effects or with advantages (in various ways like not polluting caches, speed...)

Which is the case for example for ld.global.nc.L1::no_allocate.L2::256B that they use in deepseek that provides significant acceleration while beeing reliable (although not working on all architectures so they have ways to disable it)


Gonna check what SASS it get translated to and whether it makes any sense.

I wonder if they had SASS assembler for Hopper (either by reverse engineering nvdisasm or by fuzzing instructions + nvdisasm + stare hard) and don't want to say it out :p


You'd be looking at ptxas here. FWIW, it looks like it generates LDG.E.NA.LTC256B.U8.CONSTANT on my machine.


CPUs have instructions with similar semantics.


Much of the hype around DeepSeek is due to their extraordinarily low training and inference costs. They achieved this by optimizing their training code, apparently using PTX in addition to CUDA. PTX is kind of an intermediate assembly language for NVIDIA GPUs and people are eager to see how it was used.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: