I'm unconvinced that actually making the content is the problem. We had SketchUp fifteen years ago and that made it stupid easy to make some really killer 3d content. I don't think being able to use natural language removes the barriers to it taking off.
The real problem isn't the shapes (there's plenty of free meshes out there), it's making them interactive, which is a complicated problem that extends well beyond the capabilities of our current generation of LLMs. Just look at Roblox: it's successful because you can make the world do stuff. But we can be pretty sure that just making VR/AR Roblox doesn't make a killer app.
The challenge that a killer app for VR/AR needs to overcome is being more than a glorified art gallery.
That's not really interactive. A nerf of a basketball doesn't know how to bounce. Giving objects physics, texture, compressability, the ability to hinge or bend, etc. Is all well beyond what models today can do.
It's as interactive as that key press or mouse click you implemented in your reply. NeRFs are an intermediary step. So is motion capture and physical simulation, which we already know how to do. Folks on this side of the uncanny valley will nitpick everything rendered while everyone else is on the other side talking about the photorealism being generated.
The real problem isn't the shapes (there's plenty of free meshes out there), it's making them interactive, which is a complicated problem that extends well beyond the capabilities of our current generation of LLMs. Just look at Roblox: it's successful because you can make the world do stuff. But we can be pretty sure that just making VR/AR Roblox doesn't make a killer app.
The challenge that a killer app for VR/AR needs to overcome is being more than a glorified art gallery.