Yes, using those methods you can have a blurry idea of maybe this model is "smarter" than the other one, etc. But none of them may be used to anchor the claim that "intelligence is proportional to the log of compute", which is the claim sama was making.