Probably the parent assumes that he does have the tests, billions of them.
One very strong LLM could generate billions of tests alongside the working code and then train another smaller model, or feed it into the next iteration of training same the strong model. Strong LLMs do exist for that purpose, Nemotron 320B and Llama 3 450B.
It would be interesting if a dataset like that would be created like that, and then released as open source. Many LLMs proprietary or not, could incorporate the dataset in their training, and have on the internet hundreds of LLMs suddenly become much better at coding, all of them at once.