Data is the wrong approach to develop reasoning. You we don't want LLM's to simp...

hackinthebochs · 2024-10-11T14:35:53.000000Z

> If they have developed reasoning very few examples should be needed.

Yes, once the modules for reasoning have converged, it will take very few examples for it to update to new types of reasoning. But to develop those modules from scratch requires large amounts of examples that overtax its ability to memorize. We see this pattern in the "grokking" papers. Memorization happens first, then "grokking" (god I hate that word).

It's not like humans bootstrap reasoning out of nothing. We have a billion years of evolution that encoded the right inductive biases in our developmental pathways to quickly converge on the structures for reasoning. Training an LLM from scratch is like recapitulating the entire history of evolution in a few months.