Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They also changed the ratio of RL experience collectors to GPU workers (~1/20th the RL experience collectors, 1/2 the GPUs). I don't know what impact that has --- maybe each GPU episode has less experience? Maybe that makes for an effectively small batch size and therefore more chaotic training? But either way, why change things when you can just match them exactly?


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: