Hacker News new | past | comments | ask | show | jobs | submit login

> We don't know GPT-4 is MoE

Didn't Yampleg's tweet / leak confirm this one? I mean, he could be wrong about this, but I thought the consensus was on it being true by now.

(Copy of the removed tweets at https://www.reddit.com/r/mlscaling/comments/14wcy7m/gpt4s_de... )




It's just a dude retweeting a substack. I wouldn't bet against it* but I wouldn't bet on it either. His tweet would have just linked to the article in the top comment.

* I used to crusade against this rumor because the only source is that article, and people repeating that source. But I imagine it's a no-brainer given they have enough users that they essentially get a throughput bump 'for free' even if the model weights are huge, i.e. better to utilize as much GPU ram as you can muster, the cost of needing more GPU ram is offset by the cost of being able to run multiple inference against the model all the time anyway




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: