Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: UNet diffusion model in pure CUDA (github.com/clu0)
8 points by clu0 3 days ago | hide | past | favorite | discuss
Hi HN!

I was inspired by Andrej Karpathy's llm.c (https://github.com/karpathy/llm.c), and wrote a full diffusion model training loop in CUDA. I learnt a lot about CUDA from Simon Boehm's Matmul blog (https://siboehm.com/articles/22/CUDA-MMM).

Currently there is still a lot of room for optimization: the model is running at 45% speed of PyTorch with torch.compile.

I'm curious about any thoughts or CUDA tips for convolutions.






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: