So do this paper, although they get basically the same result from random initialization. But as mentioned, the point cloud is free since they need the camera poses anyway.
> We initialize our samples either randomly or from point clouds, typically
from Structure-from-Motion (SfM) as in 3DGS
> But as mentioned, the point cloud is free since they need the camera poses anyway.
If you have a rigid multi-camera rig, the camera poses might be known from calibration, but then the particular scene shot on such a rig, could be reconstructed without COLMAP or other structure-from-motion tools, if I understand it correctly.
As someone who works in numerical optimization, this is a dirty little secret of our profession. The optimization algorithms in the literature are great at finding local minima, but often are very sensitive to the initialization as to how small the objective is. Good heuristics for initialization are thus critical for finding a good (small objective) minimizer. Sometimes this gets to the point where the local optimization algorithm does a trivial refinement of the heuristic’s solution.
Gaussian splatting is such an impressive technique. I wish it finds real-world applications useful for average person -- right now it's probably the best way to show photorealistic scenes in VR. There ought to be more usages out there.
There's a lot of use cases, and companies already using photogrammetry techniques have adopted it extremely quickly, but for many of these businesses the usage of gaussian splats is just a technical implementation detail, resulting in quality / feature gains, but doesn't as of now unlock entirely new business models.
It's true, Gaussian Splatting is just an alternative to meshing a pointcloud for companies which currently rely on photogrammetry or lidar (Lidar works well as a basis for splatting when there's reference images taken as part of the scan). But I think that misses all the new opportunities that exist with Gaussian Splatting, which really just don't with existing techniques.
Gaussian Splats are able to handle more heterogeneous information sources, allowing more sources to help splat an environment. Devices like drones, surveillance cameras, or autonomous systems can be used to create or incrementally update a Gaussian Splat; and there's interesting work to allow them to locate themselves within the splat, not just to show themselves but also to place vision ML outputs into it (such as object detection or segmentation results).
Up till now nearly all digital representations of physical environments are either based off the original designs (by things like CAD or BIM files), or are an approximation of the environment (from photogrammetry or Lidar scans). CAD and BIM files suffer from drift, the real environment almost never perfectly matches the design files, small (and large) changes are made; and many times those files aren't even available if the structure isn't new. Photogrammetry and Lidar scans struggle because their output is a pointcloud, and it's very difficult to accurately mesh a pointcloud (Matterport only partially solved this problem and sold for $1.6B). Gaussian Splats overcome these issues; they're comparatively easy to generate for any environment, and allow for very accurate and easy viewing from any angle.
I think the Digital Twin space will be turned upside down, and they could potentially even cause huge changes in autonomous and semi-autonomous factories, warehouses, and depots. A single Gaussian Splat could be the source of truth that many autonomous vehicles update through their separate SLAM systems. Operators then would have access to this splat (and it's history) as a source of truth for the environment. Then, using techniques like iComMa[1], it may be possible to directly align XR devices into the Gaussian Splat; allowing operators direct access to location-based information generated by the environment.
That's a lot of words to say: Gaussian Splatting is a very neat new technology that could really underpin many future technologies, I'm really excited about it
I do agree that new use cases are emerging and it will probably enable tons of new businesses. I'm very gung-ho about the technology myself as well. I guess what I'm trying to say is that the new businesses that emerge because of this are not necessarily going to advertise that they use gaussian splats to do it, it's not a buzzy enough term, and many of the industries it serves just care about the results it delivers. Your average tech person is unlikely to hear much about it. Your average graphics engineer will have probably heard about it, but not know about all the use cases that are leveraging it. And your average person in the industry it is changing won't know what is causing the change (they will probably assign it to the nebulous ai bucket). I fully expect gaussian splats to be a quiet revolution.
Yeah, I see your point. I'd be surprised of Gaussian Splatting didn't make it into the advertising for Digital Twin services if/when they add it (like Bently's iTwin or Dassault's Virtual Twin). Whether that translates more broadly into the market, I don't know.
On the other hand, I'm playing with the idea of a platform which provides a Gaussian Splat based Digital Twin of an environment so other systems can utilize it to share location-based information. Even though I don't think it'll be possible to build without utilizing Gaussian Splatting; splatting may not end up in any of the pitches or advertising directly.
This is conflating splatting with more general pointcloud data.
Splatting is fundamentally about viewing pointcloud data. That's great. But it doesn't deal with all the other functions virtual twins need pointclouds for (e.g. design vs real world conformance).
Pointclouds themselves are proving hugely useful in a number of fields but vary considerably in form and application often based on how they are captured (e.g. LiDAR vs SfM photogrammetry)
Visualising pointclouds effectively has been a major problem which splatting really solves elegantly so it will be a major practical advance when splatting is added to cad software and javascript map visualisation libraries.
this is very interesting and thought-provoking; thank you
what exactly do you mean by 'digital twin'? do you mean any kind of computer model of a real-world phenomenon, including sets of differential equations, as i've sometimes seen it used? presumably you mean something narrower, but how narrow? do you mean, for example, specifically cad models of parts that are going to be manufactured?
i guess this sounds like i'm nitpicking but actually i just want to know the scope of the space that you expect to be turned upside down
For an average person, I suspect that some real estate listings will adopt Gaussian splatting soon. Many newer listings include photogrammetry already, and splatting will provide a significant improvement to existing solutions.
It’s not exactly “average person” usage but Corridor Digital used in one of their video productions to create a looping, infinite corridor with proper reflections and lighting etc.: https://www.youtube.com/watch?v=GaGcLhhhbDs
It's important to disentangle this specific approach of generating a gaussian splatting model from 2d images, with gaussian splats as a general rendering technique.
There is nothing that prevents gaussian splatting from being used dynamically. There are a variety of approaches to extend gaussian splats into the time dimension to capture and represent a 3d scene over time. The challenges here about how to capture sufficient scene data (or use ai to fill in insufficient data) and how to compress it. There are also techniques that enable dynamic simulations, or real time animation of collections of splats.
Also, adding un-baked lighting to gaussian splats is not particularly hard, you can already throw slats into several game engines / 3d renderers and add new lights to them. The hard part of relighting is taking an existing capture of a scene with baked-in lighting and deriving the resulting material properties and lighting sources. This isn't directly related to gaussian splats themselves though, you would have a similar problem recovering the base materials and lights from a 3d mesh with baked-in lighting textures. This really falls under a separate category of techniques called "inverse rendering". If anything, gaussian splats give us a new tool to help with these sorts of problems.
Honestly the biggest remaining roadblocks to more elaborate and widespread uses of gaussians as a rendering method are probably storage and performance related. And I'm optimistic these will be convincingly solved, triangle rasterization has had many orders of magnitude more research, optimization, and custom hardware built around it.
I don't know if anyone has actually done it yet, but for scans of models in controlled lighting environments, you could use higher dimensional Gaussian Splatting to get more accurate lighting than standard techniques. (Similar to the 4D version with time, use higher dimensions to encode light source coordinates.)
or spherical harmonics of the illuminance field; you only need a very few spherical harmonics to get an excellent approximation of lambertian diffuse lighting
Doesn't VR generally require high framerates at high resolution? Gaussian splatting seems to be one of those borderline-realtime things right now which can't run at 4k120fps yet on consumer hardware.
It's more expressive than mesh-and-texture, which is the default: it can represent things with poorly defined edges or transparency. Also, because it's solved via GD, you tend to get better fidelity than traditional photogrammetry that relies on MV correspondence.
But also you're just throwing a ton of primitives at the problem; high quality scenes typically have millions of splats. That's a lot of data, so it's no wonder it can be pretty photorealistic. (Still impressive, though.)
OK, I read the paper, agree the results look good, like the idea of better formal grounding for how to choose where your splats are, and ... I still have no idea what that top image is of. Is it the distribution of where they put initial splats for any given 2D image?? Why does the caption mention buildings? I'm really lost.
Ahhhh, thank you. It is a video that does not work with Safari on Mac or iOS. It works great on Firefox, and I presume Chrome based on the lack of complaints :)
So, just to make it clear, the main difference in this paper is adding a small amount of noise to each update? I'm a little frustrated that I read through the whole paper and still am not sure about this.
No the main difference is looking at the problem from a new perspective, relating it to a large body of existing work in statistics (see https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo and https://en.wikipedia.org/wiki/Stochastic_gradient_Langevin_d...). Then they are able to use this new perspective to add several improvements that lead to significant quality improvements, clearly establishing the validity and utility of the new theoretical underpinnings. It's likely this will be massively influential in the direction of future research in the space.
The actual changes this led to are:
1) As you mentioned, they added noise. But notably, they state it was "designed carefully" to conform to the requirements of SGLD and they detail how they designed the noise.
2) They simplified the original operations of "move, split, clone, prune, and add" and their related heuristics, into a single operation type. They do so guided by existing knowledge about MCMC frameworks, leading to a simpler model with stronger theoretical underpinnings (huge win!).
3) Adjustments to how gaussians are added and pruned to better fit with the new model. This seems more like housekeeping rather than something novel in and of itself.
I'm not an expert at all, but my understanding is that Gaussian splatting is essentially a rendering technique. Normally, you'd take actual data, like a set of photos, and optimize some Gaussians against it to arrive at a volumetric representation. In the case of AI-generated splats, it's kinda flipped. Instead of optimizing against a known ground truth, the AI is generating Gaussians to be rendered. The insight of this paper is that we already have great statistical tools for numerically estimating ground truth based on a bunch of Gaussians, so why not just apply those?
>Unlike existing approaches to 3D Gaussian Splatting, we propose to interpret the training process
of placing and optimizing Gaussians as a sampling process. Rather than defining a loss function and simply taking steps towards a local minimum, we define a distribution G which assigns high
probability to collections of Gaussians which faithfully reconstruct the training images.
What is the practical difference here? MCMC itself samples more from higher probilities than lower ones (ie. towards a local minimum). Is it just that we sample more from lower ends of the distribution? Or is it more about formalizing the previous algorithm so that it is easier to play with the different parameters? (eg. the acceptance threshold)
What are the “consumer” applications of 3D splatting? It looks super cool, and I can see applications for the next gen of maps, but I don’t understand if it’s going to be an end user technology or not.
do you mean something that end users are aware of using? probably not. but probably everything end-users do that involves digital models of three-dimensional objects will use gaussian splatting fairly soon: cad, cam, photography, maps, video games, surgery, georectification of satellite images, diagnostic radiography, ultrasound, radar, sonar, animation, missile guidance, other robotics applications, fashion design, 3-d printing, etc.
It's very unlikely it will be used in video games. Video games pretty much exclusively use texture mapped polygon meshes as data structure. They easy to render, they plays well with object animations, they allow for dynamic lights, and there exists a lot of mature software to create and edit this data. 3D Gaussian splats also render quite fast, but they are bad for animations, they don't allow for dynamic lights (as far as I can tell), and there are basically no tools to create and edit models, except from photogrammetry.
Probably about as consumer-related as any other part of a 3d rendering pipeline. Most 3D rendered images you see-- VFX, motion graphics, animation-- were rendered and composited into video clips. Some things like video games render 3D in real time.
When you get used to clicking a few links to to the a cited reference the whole scroll-to-reference, copy-paste to google, scroll to paper song and dance seems a lot less fun.
The paper is also on arXiv, which includes the TeX source. The experimental HTML view for this paper has automatic (internal) hyperlinks like hyperref: https://arxiv.org/abs/2404.09591
Edit: The TeX source has the `draft` option for hyperref that disabled hyperlinks in the produced pdf. External links to references aren't recoverable from the included main.bbl (probably because it was built with the `draft` option).
Yeah, scientific papers are a pain to read. Some have a Latex extension which links you to the paper down in the references but it's not bidirectional - you have to scroll back to your previous position.