Interesting, but the video is only available at 480p for now, so it's hard to actually see any details in the demo. I assume this was uploaded very recently and YT is still processing the higher-quality versions.
Forgive me if I have this all wrong, but in summary it seems to be that they have created a faster algorithm for creating a 3D (built of voxels, not polygons) from a set of 2D photos of that scene.
The actual representation is using voxels to index a spherical harmonic coding of scene detail (I think), and that includes transparency so the novel view synthesis accretes light along the ray length. This lets it represent transparency and other lighting effects that aren't possible with simple photogrammetry.
It seems to be a faster way of calculating differentiable volumetric geometry from a scene (whether the input is computer-generated or captured from the real world). I didn't get it at all until I explored the linked NeRF site first.
No, AFAICT it's actually building a representation of a more detailed ray-traced 3D geometry in a way that can be rendered real time from any angle and with realistic lighting
Per the paper it's not quite real time (15 fps), and they suggest converting to a different representation (PlenOctrees) for real time.
But yes, it's far more realistic than your simple mesh-and-texture; the video shows it handles reflections and transparency. When NeRF came out (~2020?) it was a major step forward in light field quality and this technique appears to maintain that quality.