Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When you build a new model, there is a spectrum of how you use the old model: 1. taking the weights, 2. training on the logits, 3. training on model output, 4. training from scratch. We don't know how much advantage #3 gives. It might be the case that with enough output from the old model, it is almost as useful as taking the weights.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: