It's a pretty basic "free for non-commercial use, contact us for commercial use" license.
And the Inria rasterizer is not proprietary either. It's non-commercial open source with the option to purchase a commercial license.
These are perfectly reasonable tech stacks for research projects to build off of. If you have an issue with the license, implement it yourself based on the papers (which all outline the necessary details to do so).
Rasterization is actually why 3D Gaussian Splats have been so successful. Being able to render 3DGS scenes by iterating over the objects and drawing the pixels each one covers is much faster than ray-marching every pixel, which is how neural radiance fields (the last hot 3D reconstruction technology) are rendered.
Crazy good results but without the paper (which the link at the time just goes back to the site) it's a bit difficult to check how good. What data is required, how long are training runs/how many steps?
Using 200 photos taken with a conventional camera, at a refresh rate of 105 frames per second - the quality of video game images - the result gives the illusion of walking through the video. Better still, if you zoom in, you can see finer details, such as the spokes of a bicycle wheel, in excellent detail.
It use neural network techniques, but it's not strictly using NN.
it do the same result as Nerf from google in 30min, nvidia result in 7minute. It can achieve more than 100fps if you let it train longer.
Even old image data is pretty useful. If they could make a 3d view that seamlessly integrated satellite, plane, and street level imagery into one product, it would be a much better UX than having to manually switch to street view mode.
Google uses texture mapped polygons instead of 3D Gaussians, so this wouldn't work for Google Maps. But there actually is a collection of libraries which does the same thing for polygonal data: https://vcg.isti.cnr.it/nexus/
One of the guys working on this is Federico Ponchio. His 2008 PhD thesis, which provided the core insight for Unreal Engine's Nanite, is referenced at bottom.
> Google uses texture mapped polygons instead of 3D Gaussians,
Time to switch I'd say...
Polygons are a poor fit, especially for trees and windows and stuff that needs to be semitransparent/fluffy.
I suspect the gaussians will compress better, and give better visual quality for a given amount of data downloaded and GPU VRAM. (the current polygon model uses absolutely loads of both, leading to a poor experience for those without the highest end computers and fast internet connections).
I am really impressed by the Apple Maps implementation. I think it also uses textured polygons, but does so in a very good looking way and at 120 fps on an iPhone, showing even a whole city in textured 3d.
Apple bought a Swedish startup called C3 and their became 3D part of Apple Maps. That startup was a spin-off from Saab Aerospace, who had developed a vision system for terrain-following missiles. Saab ran a project with the municipal innovation agency in Linköping and the result was that they decided this tech should be possible to find civilian use cases for. C3 decided to fly small Cessnas in grids across a few major cities and also Hoover Dam, and built a ton of code on top of the already extremely solid foundation from Saab. The timing was impeccable (now many years ago) and they managed to get Microsoft, Apple and Samsung into a bidding war which drove up the price. But it was worth it for Apple to have solid 3D in Apple Maps and the tech has stood the test of time.
I remember seeing a Nokia or Here demo around that time that looked like similar or the same tech. Do you know anything published about it with technical details? Seems like enough time has passed that it would be more accessible. I would love to learn more about it.
So this is just Level-of-Detail (LoD) implemented for Gaussian splats? Impressive results, but I would have figured this is an obvious next-step...
Also, is it bad that the first thing I thought of was that commanders in the Ukraine war could use this? E.g.: stitch together the video streams from thousands of drones to build up an up-to-date view of the battlefield?
Gaussian splatting feel magical, and with 4D Gaussian splatting now being a thing, 3D movies that are actually 3D, and in which you can navigate could be a reality in the coming years. (And I suspect the first use-case will be porn, as usual).
Can anyone familiar with 3d graphics speculate what would be required to implement this into a game engine?
I'm guessing that adding physics, collision-detection etc. on top of this is non-trivial compared to using a mesh?
But I feel like for stuff like tree foliage (where maybe you don't care about collisions?), this would be really awesome, given the limitations of polygons. + also just any like background scenery, stuff out of the player's reach.
It's easy to render these in a game engine. I'm sure physics and collision detection are possible. The big, huge, gigantic issue is actually lighting.
These scenes come with real world lighting baked in. This is great because it looks amazing, it's 100% correct, far better than the lighting computed by any game engine or even movie-quality offline ray tracer. This is a big part of why they look so good! But it's also a curse. Games need to be interactive. When things move, lighting changes. Even something as simple as opening a door can have a profound effect on lighting. Anything that moves changes the lighting on itself and everything around it. Let alone moving actual lights around, changing the time of day, etc.
There's absolutely no way to change the baked-in lighting in one of these captures in a high quality way. I've seen several papers that attempt it and the results all suck. It's not the fault of the researchers, it's a very hard problem. There are two main issues:
One, in order to perfectly re-light a scene you first have to de-light it, that is, compute the lighting-independent BRDF of every surface. The capture itself doesn't even contain enough information to do this in an unambiguous way. You can't know for sure how a surface would react under different lighting conditions than were present in the pictures that made up the original scan. Maybe in theory you can guess well enough in most cases and extrapolate, and AI can likely help a lot here, but in practice we are far away from good quality so far.
Two, given the BRDF of all surfaces and a set of new lights, you have to apply the new lighting to the scene. Real-time solutions for lighting are very approximate and won't be anywhere near the quality of the lighting in the original scan. So you'll lose some of that photorealistic quality when you do this, even if your BRDFs are perfect (they won't be). It will end up looking like regular game graphics instead of the picture-perfect scans you want. If you try to blend the new lighting with the original lighting, the boundaries will probably be obvious. You're competing with perfection! Even offline rendering would struggle to match the quality of the baked-in lighting in these captures.
To me the ultimate solution needs to involve AI. Analytically relighting everything perfectly is infeasible, but AI can likely do approximate lighting that looks more plausible in most cases, especially when trying to match captured natural lighting. I'm not sure exactly how it will work, but AI is already being used in rendering and its use will only increase.
You've elucidated very clearly an issue that I've been thinking about since the very first time I saw gaussian splats. The best idea I've had (besides "AI magic") is something like pre-calculating at least two different lighting states, e.g. door open and door closed, or midday and evening, and then blending between them.
Do you know if anyone has tried this? Or otherwise, what're the best current attempts at solving it?
There is at least one "large scale gaussian splatting" type paper that did splatting for a few city blocks and they used data from across the day to build the final model such that you could set the time of day and the model would roughly reflect lighting at that time.
Fascinating - thanks for the detailed reply. I can’t believe I failed to think of lighting but this makes so much sense.
It’s almost like for a pure ML solution you would need a nerf (or similar) which is conditioned on the entire (dynamically changing) scene geometry and lighting positions?
Graphics isn’t my area of ML though, so I’m sure there’s a lot of nuance that I don’t appreciate.
Thanks for pointing out the challenges with gaussian splattings. Are there any AI based relighting methods out there?
Some prompt based editing like nerf2nerf or Language-embedded NerFs maybe?
I worked in game engines for a long time. The main hurdle is just that it’s new. There’s a many-decade legacy pipeline of tools and techniques built around triangles. Splats are something new.
The good news is that splats are really simple once they’ve been generated. Maybe simpler than triangles depending on how you look at it. It’s just a matter of doing the work to set up new tools and pipelines.
Game physics are often using a separate mesh from the one used for rendering or even combination of primitive shapes anyway. So it doesn't matter how graphics part is rendered. No point wasting resources on details which don't affect gameplay, and having to much tiny collision geometry increase chance of having player stuck or snag against it.
This particular implementation yes, but at the same time it's mostly for landscapes (or maybe scenes), but you don't need to use this kind of LOD stuff for the things you want to animate.
People should stop basing all of this new research on proprietary software, when we have open source implementations [1][2].
[1] gsplat: https://github.com/nerfstudio-project/gsplat [2] opensplat: https://github.com/pierotofy/opensplat