I still need to understand how George Hotz knows about the GPT-4 architecture if...

sigmoid10 · on Dec 11, 2023

As someone who has worked in the field for many years now and closely follows not just the engineering side but also the academic literature and the personell movements on linkedin, I too was able to put together a lot of this. Especially with GPT-3 Turbo it was obvious what they did due to the speed difference. At least in terms of model architectures and orders of magnitude for parameters. From there you could do some back of the envelope calculations and guess how big GPT4 had to be given its speed. I wouldn't have dared to say any specific numbers with authority, but maybe Hotz has talked to someone at OpenAI. On the other hand, the updated article now claims his numbers were off by a factor of 2 (at least for the individual experts - he still got the total number of parameters right). So yeah, maybe he was just guessing like the rest of us after all.

PedroBatista · on Dec 11, 2023

You don't necessarily need to know the architecture, given the "only" real metric regarding speed is tokens/sec and that pretty much depends on memory bandwidth, you can infer with some certainty the size of the model.

Also, if we have been eating up posted "benchmarks" with no way to independently validate them and watching heavily edited video presentations, why can't we trust our wonder kid?

GaggiX · on Dec 11, 2023

That doesn't explain how we know that GPT-4 is a sparse MoE model with X experts of Y size and using Z of them during inference.

Laaas · on Dec 11, 2023

IIRC it was leaked/confirmed by accident or something like that