Hacker News new | past | comments | ask | show | jobs | submit login

The reward function could be "pass all of these tests I just wrote".



Lol. Literally.

If you have those many well written tests, you can pass them to a constraint solver today and get your program. No LLM needed.

Or even run your tests instead of the program.


Probably the parent assumes that he does have the tests, billions of them.

One very strong LLM could generate billions of tests alongside the working code and then train another smaller model, or feed it into the next iteration of training same the strong model. Strong LLMs do exist for that purpose, Nemotron 320B and Llama 3 450B.

It would be interesting if a dataset like that would be created like that, and then released as open source. Many LLMs proprietary or not, could incorporate the dataset in their training, and have on the internet hundreds of LLMs suddenly become much better at coding, all of them at once.


You cannot


After much RL, the model will just learn to mock everything to get the test to pass.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: