I think their thought process is unconvincing, although I think they're probably correct.
A much better paper is, "Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models". They took it much further and trained an LLM such that for every inference, they could see exactly which document from the training dataset it referenced to answer a particular question. ngl This paper AGI-pilled me.
A much better paper is, "Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models". They took it much further and trained an LLM such that for every inference, they could see exactly which document from the training dataset it referenced to answer a particular question. ngl This paper AGI-pilled me.
https://arxiv.org/abs/2411.12580