Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, their huge GPU clusters have "insane VRAM". Once you can actually load the model without offloading, inference isn't all that computationally expensive for the most part.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: