Interesting to see how bad the physics/object permanence is. I wonder if combining this with a Genie 2 type model (Google's new "world model") would be the next step in refining it's capabilities.
This feels like computer graphics and the 'screen space' techniques that got introduced in the Xbox 360 generation - reflection, shadows etc. all suffered from the inability to work with off screen information and gave wildly bad answers once off screen info was required.
The solution was simple - just maintain the information in world space, and sample for that. But simple does not mean cheap, and it led to a ton of redundant (as in invisible in the final image) having to be kept track of.
It's a pretty binary thing in the sense that "bad physics" pretty quickly decoheres into no physics.
I saw one of these models doing a Minecraft like simulation and it looked sort of okay but then water started to end up in impossible places and once it was there it kept spreading and you ended up in some lovecraftian horror dimension. Any useful physics simluation at least needs boundary conditions to hold and these models have no boundary conditions because they have no clear categories of anything.
Here’s an idea - what if the fact that we have a body that has weight and consequence helps us understand physics? What if just visual data won’t get there because visual data lacks the sense of self? Could be interesting
Not consistently though. I think some model of understanding of physics is emergent but it doesn’t seem emergent enough. The model doesn’t understand object permanence either.