Alas, no, though I'm going to think out loud a bit. I've had to go from making a comment like this once a month to twice a week, so I'm curious what pops out as helpful to point to.
Forgive opinionated language, it's more concise and is more clear to you what exactly I can give evidence of:
- December 22: proto-AI influencers are latching onto GPT4 rumors as a source of engagement. Bunch of people start repeating "RUMORS say GPT4 has ONE TRILLION parameters" Altman laughs, most people laugh, it's not quite so big a community yet.
This percolates, but you kinda ignore it: it's to non-tech people and it's unfalsifiable.
- Feb 23: GPT3.5 API announcement, run out of news, and GPT4 stuff circulates again. MS Euro executive throws gas on the fire by confirming it's release 1.5 weeks earlier. These claims circulate in coverage of what GPT4 might be. However, the circulation is 99.99% in non-tech circles still.
- Mar 23: GPT4 comes out, by now "Chinchilla scaling laws" went from something 10% of tech following AI knows about, to maybe 0.1%. OpenAI releases ~0 information on # of parameters, training, or runtime details, just a visualization of a Chinchilla-fit scaling curve and that they were able to predict the models abilities in advance based on scaling laws.
- Apr 23: GPT4 release content is old now, people needing content venture into claiming details about the model from leaks -- its just the same the trillion parameter thing.
- May 23: Tech substacks beging offering a perspective on AI. They're new and don't know enough to know Altman laughed it off...and that it would be absurd for 100 other reasons. It comes up. A particularly famous blog handwaves about "mixture of experts" to explain how the trillion parameter number could make sense given the most basic reason why they wouldn't, Chinchilla scaling, and the most factual reason it isn't: Altman laughing it off. "Altman was just parsing the idea closely to hide details, it was a showman stunt!"
- Jun 23: The tech community interested in AI outstrips the sober-minded/experienced with LLMs by 1000:1, and this sounds plausible, and it's unfalsifiable. There is no proof it _isn't_ true, and it could be true, and it's a comfortable way to "understand" without putting in the work to understand. People start laundering it to HN in subdiscussions. I see it once the whole month.
- end of July 23: I've seen it every week in July, twice this week.
This is the first time I've seen the mixture of experts simplified to "it generates 16 answers and picks one" ---
which is a thing!
Except that's top-K.
And it's a _completely independent claim_ from the original misunderstandings, and it is a misunderstanding of the misunderstandings that shores up the weak points of the misunderstandings.
Yet, the claim only would make sense if the misunderstandings were true at their face, weak points and all: generating 16 from the same model has existed for a very very long time. I only got in on this in 2019, but its been around since then, and I'm almost certain someone with formal ML training will pop in and say "1965 bro"
Wait, so it was never even confirmed or actually leaked by OpenAI that they're using a MoE model? That was just invented by some blog? I've seen it mentioned everywhere as though it's true.
I think it's likely they're using a technique that is similar to or a descendant of the Tree of Thought technique, because in Karpathy's talk where he was not allowed to discuss GPT4s architecture so he had to discuss only information in the public domain about other models, he pretty strongly indicated that the direction of research he thought people should pursue was ToT. In the past, Karpathy has communicated basically as much as he can to try and educate people about how these models are made and how to do it yourself - he has one of the best YouTube tutorials on making an LLM up. I suspect that he personally probably does not agree with OpenAI's level of secrecy, but at minimum he shares a lot more information publicly than most OAI employees.