Hacker Newsnew | past | comments | ask | show | jobs | submit | zyoralabs's commentslogin

Python-to-Metal kernel compiler for Apple Silicon. You write GPU kernels as decorated Python functions — locomp compiles them through an SSA intermediate representation to native Metal Shading Language, optimizes them (CSE, DCE, constant folding), and dispatches on your Apple GPU.

Think Triton, but for Apple Silicon.

It supports the full kernel programming model: SIMD reductions, shared memory, atomics, simdgroup matrix ops (AMX hardware), auto-tuning, float16, INT4/INT8 quantization. 54 working examples including Flash Attention v1/v2/v3, paged attention, RoPE, SwiGLU.

As a proof of concept — SmolLM2-135M runs end-to-end on locomp kernels. No PyTorch, no MLX, no Metal C++. Just @locomp.kernel Python.

pip install locomp


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: