No it's not. They ran it longer instead.

jeffbee · 2024-12-01T07:41:14 1733038874

The 2022 paper pretty explicitly says that runtime is not a substitute. They say their best result "can only be achieved in our 8-GPU setup".

phonon · 2024-12-01T12:45:43 1733057143

I assume you mean Fig. 6 here?[0]

But that was explicitly limited to 8 hours for all setups. Do they have another paper that shows that you can't increase the number of hours of a smaller GPU setup to compensate?

[0]https://dl.acm.org/doi/pdf/10.1145/3505170.3511478

wholehog · 2024-12-02T06:35:34 1733121334

They also changed the ratio of RL experience collectors to GPU workers (~1/20th the RL experience collectors, 1/2 the GPUs). I don't know what impact that has --- maybe each GPU episode has less experience? Maybe that makes for an effectively small batch size and therefore more chaotic training? But either way, why change things when you can just match them exactly?