Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Exactly, since its not doing whole program synthesis im thinking it could be done with fewer parameters. However program synthesis is part of the loss function.


Program synthesis is part of the loss function, which is what makes it a auxiliary learning task.

We haven’t experimented with model size yet, we just used the same configuration as the smallest Code Llama. We did play with dataset size and found thah performance tracks the usual scaling laws. Details in the paper


Thanks for the reply!. This is really interesting research, hope to see more from your team.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: