> Theory 2: GPT-3.5-instruct was trained on more chess games.
Alternatively somebody who prepared training materials for this specific ANN had some spare time and decided to preprocess them so that during training the model was only asked to predict movements of the winning player and that individual whimsy was never repeated in training of any other model.
Having seen bit rot in action, I totally buy this explanation. Some PhD did this on their spare time and then left and when it didn't work in the gpt-4.0 training branch, it just got commented out by someone else and then forgotten.
Alternatively somebody who prepared training materials for this specific ANN had some spare time and decided to preprocess them so that during training the model was only asked to predict movements of the winning player and that individual whimsy was never repeated in training of any other model.