Hacker News new | past | comments | ask | show | jobs | submit login

> It's literally just a matter of making them a little faster and less resource intensive

More like a few orders of magnitude faster and less resource intensive, the elephant in the room with this technology is that it's currently being offloaded to an expensive cloud instance where it can have an entire high-end GPU devoted to it. For it to be the kind of thing that games can adopt as a matter of course, without having to factor in a significant ongoing cost and point of failure when the servers go down or the money to pay the bills runs out, it needs to be fast enough to run locally on the players machine while the machine is also busy handling everything else in the game. Realistically you can afford to use maybe 5-10% of their GPUs time and memory, and you can't expect everyone to have an RTX 4090 so it needs to scale down gracefully.




I think that hardware will adapt to serve the need. Might not look like AI accelerator cards like 3D accelerator cards in the 90s but I think compute will be carved out just to do on device llms. 7B param models are already really fast on CPU and memory on a pretty normal system, you could probably come up with a technique to do good character dialog with a 7B model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: