When you build a new model, there is a spectrum of how you use the old model: 1....

skinner_ 5 months ago | parent | context | favorite | on: OpenAI says it has evidence DeepSeek used its mode...

When you build a new model, there is a spectrum of how you use the old model: 1. taking the weights, 2. training on the logits, 3. training on model output, 4. training from scratch. We don't know how much advantage #3 gives. It might be the case that with enough output from the old model, it is almost as useful as taking the weights.