It *is* possible to learn to reason from scratch, that's what R1-0 did, but the ...

dchichkov · 2025-01-29T22:34:05 1738190045

If you look at the benchmarks of the DeepSeek-V3-Base, it is quite capable, even in 0-shot: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base#base-mod... This is not from scratch. These benchmark numbers are an indication that the base model already had a large number of reasoning/LLM tokens in the pre-training set.

On the other hand, my take on it, the ability to do reasoning in a long context is a general capability. And my guess is that it can be bootstrapped from scratch, without having to do training on all of the internet or having to distill models trained on the internet.

cma · 2025-01-30T00:53:20 1738198400

> These benchmark numbers are an indication that the base model already had a large number of reasoning/LLM tokens in the pre-training set.

But we already know that is the case: the Deepseek v3 paper says it was posttrained partly with an internal version of R1:

> Reasoning Data. For reasoning-related datasets, including those focused on mathematics, code competition problems, and logic puzzles, we generate the data by leveraging an internal DeepSeek-R1 model. Specifically, while the R1-generated data demonstrates strong accuracy, it suffers from issues such as overthinking, poor formatting, and excessive length. Our objective is to balance the high accuracy of R1-generated reasoning data and the clarity and conciseness of regularly formatted reasoning data.

And deepseekmath did a repeated cycle of this kind of thing mixing in 10% of old previously seen data with new generated data from last gen in a continuous bootstrap.