Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most people running local inference do so thorough quants with llamacpp (which runs on everything) or awq/exl2/mlx with vllm/tabbyAPI/lmstudio which are much faster to than using pytorch directly


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: