I work on a much easier problem (physics-based character animation) after spending a few years in motion planning, and I haven’t really seen anything to suggest that the problem is going to be solved any time soon by collecting more data.
"We present Dreamer 4, a scalable agent that learns to solve control tasks by imagination training inside of a fast and accurate world model. ... By training inside of its world model, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, aligning it with applications such as robotics where online interaction is often impractical."
In other words, it learns by watching, e.g. by having more data of a certain type.
I am pushing the optimism a bit of course, but currently we can see many demos of robots doing basic tasks, and it seems like it is quite easy nowadays to do this with the data driven approach.
The problem becomes complicated once the large discrete objects are not actuated. Even worse if the large discrete objects are not consistently observable because of occlusions or other sensor limitations. And almost impossible if the large discrete objects are actuated by other agents with potentially adversarial goals.
Self driving cars, an application in which physics is simple and arguably two dimensional, have taken more than a decade to get to a deployable solution.
Next to zero cognition was involved in the process. There's some kind of hierarchy of thought in the way my mind/brain/body processed the task. I did cognitively decide to get the beer, but I was focused on something at work and continued to think about that in great detail as the rest of me did all of the motion planning and articulation required to get up, walk through two doorways, open the door on the fridge, grab a beer, close the door, walk back and crack the beer as I was sitting down.
Basically zero thought in that entire sequence.
I think what's happening today with all of this stuff is ultimately like me trying to play Fur Elise on piano. I don't have a piano. I don't know how to play one. I'm going to be all brain in that entire process and it's going to be awful.
We need to learn how to use the data we have to train these layers of abstraction that allow us to effectively compress tons of sophistication into 'get a beer'.