It’s a pity if realizing these gains absolutely requires full pre-training from scratch. I imagine more than a few people will at least try to find a way to repurpose the knowledge contained in existing models.
You can also have another model "mentor" a new model you are teaching to speed up training. You don't have to start from scratch with zero knowledge. This is done a lot in what are called distillations.