Total training FLOPs can be deduced from model architecture (which they can't hi...

Total training FLOPs can be deduced from model architecture (which they can't hide since they released weights) and how many tokens they trained on. With total training FLOPs and GPU hours you can calculate MFU. And the MFU of their deepseek-v3 train is around 40%, which sounds right. Both Google and Meta reported higher MFU. So the GPU hours should be correct. The only thing they could have lied is on how many tokens they trained the model on. DeepSeek reported 14T which is also similar to what Meta did so nothing crazy here.

tl;dr all numbers check up and the winnings come from the model architecture innovations they made.