How does turning an image into a game help with robots? Robots don't need to guess what they can't see, they would have sensors to tell them exactly what is there (like a self driving car).
To be able to plan ahead, robots do absolutely need to plan ahead (read: "guess" or even "imagine") what they might encounter before they sense it. In your self driving car example, for instance, it needs to come up with various scenarios for what might be around the corner ahead of a turn, and assign reasonable probabilities to these scenarios. I absolutely see how a system like this could help with it.
For example, let's say that the car is approaching an intersection, and suddenly sees a puddle on the road to the left getting brighter - a visual world model like this might extrapolate a scenario that the brightness is the result of a car moving towards the intersection assigning this some probability, and signing another probably to a scenario that it's just a flickering headlight, and the car would then decide whether and how much to slow down.
In this example there is a sensor, but it definitely doesn't tell the robot "exactly what is there", and while we could try to write rules about what it should do, the Bitter Lesson tells us it's better to just let it create its own model.
I have no expertise in this area, but my assumption is that this could help for a broader sort of object/world permanence for robots - e.g. if something is no longer visible to the robot's sensors (e.g. behind an obstacle, smoke, etc) then it could use a model based on this type of tech to maintain a short-term estimate of its surroundings even when operating blind.
> Robots don't need to guess what they can't see, they would have sensors to tell them exactly what is there (like a self driving car).
Self driving vars have cameras as part of their sensor suite, and have models to make sense of sensor data. Video will help with perception and classification (understanding the world) with no agency needed. Game-playing will help with planning, execution, and evaluation. Both functions are necessary, and those that come after rely on earlier capabilities