We just dropped a new open source project: CUDA to cuTile transpiler for NVIDIA's CUDA 13.1
We built a transpiler that converts your CUDA kernels to cuTile automatically. It figures out what your kernel does (flash attention, matrix multiplication, RoPE) and writes the cuTile version
Zero AI involved! It's pure pattern matching and code analysis
the fact that this runs butter-smooth on webgl while my company's 'enterprise dashboard' struggles to render 50 divs says everything about how much performance we leave on the table with bad abstractions
Image data and telemetry were sent in different messages, so it wasn't too much of a bottleneck. The images were about ~100 bytes while the telemetry was roughly 40.
my internet has been broken for 17 months but you're more upset about me using ai to make my sentences sound professional than about comcast refusing to fix their infrastructure
We built a transpiler that converts your CUDA kernels to cuTile automatically. It figures out what your kernel does (flash attention, matrix multiplication, RoPE) and writes the cuTile version
Zero AI involved! It's pure pattern matching and code analysis
Currently supports 18 kernel patterns: - Core: GEMM, Reduction, Scan, Stencil, Elementwise, FFT - ML/DL: Convolution (1D/2D/3D), Pooling, Normalization - LLM: Flash Attention, RoPE, KV Cache, Quantization (INT8/FP8) - Specialized: Sparse matrices, Histogram, Sorting
Contributions we need:More kernel pattern templates