Hacker News new | past | comments | ask | show | jobs | submit login

It's not really. And 8x7B is not a 7B model, it's a MoE that's closer to 60B that has to be kept in memory, and uses 2 experts per token so it runs at 15B speeds.

All of the current frameworks support MoE and sharding among GPUs so I don't see what the issue is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: