Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Checkout the latest docs https://mlc.ai/mlc-llm/docs/ MLC started with demos and it evolved lately, with API integrations, documentations into an inference solution that everyone can reuse for universal deployments


Its been awhile since I looked into this, thanks.

As a random aside, I hope y'all publish a SDXL repo for local (non webgpu) inference. SDXL is too compute heavy to split/offload to cpu like Llama.cpp, but less ram heavy than llms, and I'm thinking it would benefit from TVM's "easy" quantization.

It would be a great backend to hook into the various web UIs, maybe with the secondary model loaded on an IGP.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: