Hacker News new | past | comments | ask | show | jobs | submit login

Discussion on HF [1] implies that no, conversion is not helpful. It would take training the model from scratch.

1: https://huggingface.co/papers/2402.17764




It’s a pity if realizing these gains absolutely requires full pre-training from scratch. I imagine more than a few people will at least try to find a way to repurpose the knowledge contained in existing models.


You can also have another model "mentor" a new model you are teaching to speed up training. You don't have to start from scratch with zero knowledge. This is done a lot in what are called distillations.


You can also re-use a lot of the infrastructure. Eg you can re-use your training data.


This came out a little bit ago, my open question is if this approach can be used to port weights between architectures like this.

https://arxiv.org/abs/2402.13144




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: