It's just a dude retweeting a substack. I wouldn't bet against it* but I wouldn't bet on it either. His tweet would have just linked to the article in the top comment.
* I used to crusade against this rumor because the only source is that article, and people repeating that source. But I imagine it's a no-brainer given they have enough users that they essentially get a throughput bump 'for free' even if the model weights are huge, i.e. better to utilize as much GPU ram as you can muster, the cost of needing more GPU ram is offset by the cost of being able to run multiple inference against the model all the time anyway
Didn't Yampleg's tweet / leak confirm this one? I mean, he could be wrong about this, but I thought the consensus was on it being true by now.
(Copy of the removed tweets at https://www.reddit.com/r/mlscaling/comments/14wcy7m/gpt4s_de... )