Hacker News new | past | comments | ask | show | jobs | submit login

This is the first thought came to my mind too.

Given its sparse, Will this be just replacement for MoE.




MoE is mostly used to enable load balancing since it makes it possible to put experts on different GPUs. This isn't so easy to do with a monolithic, but sparse layer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: