But that was explicitly limited to 8 hours for all setups. Do they have another paper that shows that you can't increase the number of hours of a smaller GPU setup to compensate?
They also changed the ratio of RL experience collectors to GPU workers (~1/20th the RL experience collectors, 1/2 the GPUs). I don't know what impact that has --- maybe each GPU episode has less experience? Maybe that makes for an effectively small batch size and therefore more chaotic training? But either way, why change things when you can just match them exactly?