It should be easy to test for. An LLM playing chess itself tries to predict the most likely continuation of a partial game it is given, which includes (it has been shown) internally estimating the strength of the players to predict equally strong or weak moves.
If the LLM is just pass through to a chess engine, then it more likely to play at the same strength all the time.
It's not clear in the linked article how many moves the LLM was given before being asked to continue, or if these were all grandmaster games. If the LLM still crushes it when asked to continue a half played poor quality game, then that'd be a good indication it's not an LLM making the moves (since it would be smart enough to match the poor quality of play).
LLMs have this unique capability. Yet, every AI company seems hell bent on making them... not have that.
I want the essence of this unique aspect, but better, not this unique aspect diluted with other aspects such as the pure logical perfection of ordinary computer software. I already have that!
The problem with every extant AI company is that they're trying to make finished, integrated products instead of a component.
It's as-if you just wanted a database engine and every database vendor insisted on selling you a shopfront web app that also happens to include a database in there somewhere.
If the LLM is just pass through to a chess engine, then it more likely to play at the same strength all the time.
It's not clear in the linked article how many moves the LLM was given before being asked to continue, or if these were all grandmaster games. If the LLM still crushes it when asked to continue a half played poor quality game, then that'd be a good indication it's not an LLM making the moves (since it would be smart enough to match the poor quality of play).