The reward function could be "pass all of these tests I just wrote".

marcosdumay · 2024-08-08T13:00:25 1723122025

Lol. Literally.

If you have those many well written tests, you can pass them to a constraint solver today and get your program. No LLM needed.

Or even run your tests instead of the program.

emporas · 2024-08-08T16:13:37 1723133617

Probably the parent assumes that he does have the tests, billions of them.

One very strong LLM could generate billions of tests alongside the working code and then train another smaller model, or feed it into the next iteration of training same the strong model. Strong LLMs do exist for that purpose, Nemotron 320B and Llama 3 450B.

It would be interesting if a dataset like that would be created like that, and then released as open source. Many LLMs proprietary or not, could incorporate the dataset in their training, and have on the internet hundreds of LLMs suddenly become much better at coding, all of them at once.

guipsp · 2024-08-19T14:06:36 1724076396

You cannot

acchow · 2024-08-08T21:23:42 1723152222

After much RL, the model will just learn to mock everything to get the test to pass.