These prompts are fielding millions of queries. The test questions are a small part of them. Further, the server doesn't know if it got the right answer or not, so it can't even train on them. Whereas in the arrangement with the testing companies before release, they can potentially do so, as the they are given the scores.