Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you want decent performance (more than say 20 tokens/s) for your dev team, you absolutely do need all of the model in VRAM.
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: