Hacker News new | past | comments | ask | show | jobs | submit login

As I write this (after the updates to the evaluation code), https://pub.sakana.ai/ai-cuda-engineer/kernel/2/23/optimize-... is on their top of their list of speedups, with a claim of 128x speed up on a fused 3D convolution + groupnorm + mean.

The generated implementation doesn’t do a convolution.

The 2nd kernel on the leaderboard also appears to be incorrect, with a bunch of dead code computing a convolution and then not using it and writing tanhf(1.0f) * scaling_factor for every output.






Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: