Hacker Newsnew | past | comments | ask | show | jobs | submit | benryanx's commentslogin

That's the postMessage bottleneck - PR #1 replaces it with Atomics-based dispatch which should push utilisation much higher. Early numbers look like 6.4 tok/sec on M2 Max

The part I'd point people to first is ARCHITECTURE.md — specifically the WASM binary construction section. Every other CPU inference project I know of uses Emscripten or a compiled Rust backend. PureBee builds the binary in JavaScript. That's the thing I'd most want challenged if I'm wrong about it being novel.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: