Clearly the performance of the "instruct" LLM is due to some odd bug or other is...

Clearly the performance of the "instruct" LLM is due to some odd bug or other issue. I do not believe that it is fundamentally better at chess than the others, even if it was specifically trained on far more chess data, which is unlikely. Lack of correlation is also not causation.