They also changed the ratio of RL experience collectors to GPU workers (~1/20th the RL experience collectors, 1/2 the GPUs). I don't know what impact that has --- maybe each GPU episode has less experience? Maybe that makes for an effectively small batch size and therefore more chaotic training? But either way, why change things when you can just match them exactly?