I was playing with this for a few weeks over the holiday break. This is one of the GS3D sample scenes running on PCVR at about 65 FPS. I'm sorting on the CPU at the moment, so there are some hitches, but it works! I may publish this as a Unity asset. (I'd love to get it working on Vision Pro, but we'll see.)
Chris' post doesn't really give much background info, so here's what's going on here and why it's awesome.
Real-time 3D rendering has historically been based on rasterisation of polygons. This has brought us a long way and has a lot of advantages, but making photorealistic scenes takes a lot of work from the artist. You can scan real objects with photogrammetry and then convert to high poly meshes, but photogrammetry rigs are pro-level tools, and the assets won't render at real time speeds. Unreal 5 introduced Nanite which is a very advanced LoD algorithm and that helps a lot, but again, we seem to be hitting the limits of what can be done with polygon based rendering.
3D Gaussian Splats is a new AI based technique that lets you render in real-time photorealistic 3D scenes that were captured with only a few photos taken using normal cameras. It replaces polygon based rendering with radiance fields.
1. A 3D point cloud is estimated by using "structure in motion" techniques.
2. The points are turned into "3D gaussians", which are sort of floating blobs of light where each one has a position, opacity and a covariance matrix defined using "spherical harmonics" (no me neither). They're ellipsoids so can be thought of as spheres that are stretched and rotated.
3. Rendering is done via a form of ray-tracing in which the 3D Gaussians are projected to the 2D screen (into "splats"), sorted so transparency works and then rasterized on the fly using custom shaders.
The neural network isn't actually used at rendering time, so GPUs can render the scene nice and fast.
In terms of what it can do the technique might be similar to Unreal's Nanite. Both are designed for static scenes. Whilst 3D Gaussians can be moved around on the fly, so the scene can be changed in principle, none of the existing animation, game engines or artwork packages know what to do without polygons. But this sort of thing could be used to rapidly create VR worlds based on only videos taken from different angles, which seems useful.
This is great – thanks for explaining it all, Mike!
In the research paper, the authors compare GS3D to NeRFs, but the scenes train more quickly (perhaps 30 minutes for a scene), and render in real-time. So it's really been a huge upgrade from NeRFs and photogrammetry.
I think there's a lot of potential to use GS3D for AR/VR avatars, and also stable diffusion-type scene generation, (where a more-complete scene is generated from one or two photos, or solely imagined by AI).
Very cool work! Are there papers or repos that do fast splat generation of digitally-originated assets?
I'm wondering if there is a way to embed digitally-originated assets in the scene and render them using the same splat drawing pipeline you're using to render your photographically-originated assets?
Huh, that's a neat idea. I haven't personally seen any papers that explore converting traditional 3D art into splats, but it should be possible. At a very basic level: you could render a 3D item on a black background, capture it from a few different angles, and then generate splats using the conventional approach. I'm sure there would be even more optimal ways to do it, too.
Having said that, you can absolutely blend standard opaque geometry with the splat environment. I just mentioned this in another comment, but I tested it, and it worked quite well with opaque geometry: https://youtu.be/8uB4qC2UtCY
I was thinking about whether you could also render deforming geometry that way. My guess was it would integrate better into the scene visually, but my intuition there isn't well informed yet.
Thanks! Yes, but only opaque objects. Transparent objects would be hard to fit into the rendering order without order-independent transparency.
I did a version with stochastic alpha to coverage, which would work well alongside both opaque and transparent objects, but it made the splats look very grainy.
Very interesting, thanks! I'd love to see some non-realistic objects in a scene like that to get an idea of what it looks like. Would be a cool video idea for the channel :)
Ahh, I'm unlikely to publish any articles, sorry! But Aras has a great Unity renderer for non-VR platforms, if you want to play around with an existing implementation: https://github.com/aras-p/UnityGaussianSplatting
His implementation runs incredibly well on both my Mac and PC.
Real-time 3D rendering has historically been based on rasterisation of polygons. This has brought us a long way and has a lot of advantages, but making photorealistic scenes takes a lot of work from the artist. You can scan real objects with photogrammetry and then convert to high poly meshes, but photogrammetry rigs are pro-level tools, and the assets won't render at real time speeds. Unreal 5 introduced Nanite which is a very advanced LoD algorithm and that helps a lot, but again, we seem to be hitting the limits of what can be done with polygon based rendering.
3D Gaussian Splats is a new AI based technique that lets you render in real-time photorealistic 3D scenes that were captured with only a few photos taken using normal cameras. It replaces polygon based rendering with radiance fields.
https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
3DGS uses several advanced techniques:
1. A 3D point cloud is estimated by using "structure in motion" techniques.
2. The points are turned into "3D gaussians", which are sort of floating blobs of light where each one has a position, opacity and a covariance matrix defined using "spherical harmonics" (no me neither). They're ellipsoids so can be thought of as spheres that are stretched and rotated.
3. Rendering is done via a form of ray-tracing in which the 3D Gaussians are projected to the 2D screen (into "splats"), sorted so transparency works and then rasterized on the fly using custom shaders.
The neural network isn't actually used at rendering time, so GPUs can render the scene nice and fast.
In terms of what it can do the technique might be similar to Unreal's Nanite. Both are designed for static scenes. Whilst 3D Gaussians can be moved around on the fly, so the scene can be changed in principle, none of the existing animation, game engines or artwork packages know what to do without polygons. But this sort of thing could be used to rapidly create VR worlds based on only videos taken from different angles, which seems useful.