In the past after I got Quest 2 and started to dive into the world photogrammetry. I went into the entire pipeline into building a 3D *model* from photos of an object taken from different angle. Pipeline involved using MeshRoom and few other software to clean up mesh and port it into Unity.
In the end (from my superficial) understanding, the problem with porting anything into VR (say in Unity in which you can walk around an object) is the important of creating a clean mesh. The 3D model that tools such as OP (I haven't dived deep into it yet) is these are point cloud in 3D space. They do not generate a 3D mesh.
Going from memory from tools I came across during my research, there is tools like this https://developer.nvidia.com/blog/getting-started-with-nvidi..., again, this does not generate a mesh. I think it is just a video and not something you can simply walk around in VR.
My low key motivation was to make a clone/model like what Matterport and sell it to real estate companies. Major gap in my understanding - the cause of me to loose steam is - I was not sure how are they able to automate the step to generate clean mesh from bunch of photos from a camera. To me, this is the most labor intensive part. Later, I heard there are ML model that is able to do this very step, I have no idea on this tho.
Perhaps using Unreal + nanite + PCVR would be a better option? Nanite can handle highly complex meshes and algorithmically simplify them in realtime. Basically a highly advanced LOD system. Not sure what limitations are but it's worth a try. Also I highly recommend using Reality Capture for photogrammetry. The pricing is super cheap and you pay per scan.
NeRFs are sort of last year's technology. The latest hype is about gaussian splats.
My understanding is that essentially these technologies take some images as input, and then train a model, where the model is learning the best way to render the imagines into a model in some sense. I think for gaussian splats, it represents images as sort of "blobs" in space, and each image has the same set of blobs that have to be used from some perspective to render the image, hence by positioning the splats such that each image is rendered correctly, you can reproduce the scene.
This training is currently very expensive and has to be done for each model, but produces an output that can be explored in real time.
I think the photogrammetry approaches used by matterport et all are older and require much higher quality input data, whereas the newer approaches can work with much less and lower quality data.
https://github.com/3DTopia/OpenLRM (They mention NeRF as inspiration but it seems original paper it was based on decided to use visual transformers. the opensource version seems to use meta's dino as one of key components)
In the end (from my superficial) understanding, the problem with porting anything into VR (say in Unity in which you can walk around an object) is the important of creating a clean mesh. The 3D model that tools such as OP (I haven't dived deep into it yet) is these are point cloud in 3D space. They do not generate a 3D mesh.
Going from memory from tools I came across during my research, there is tools like this https://developer.nvidia.com/blog/getting-started-with-nvidi..., again, this does not generate a mesh. I think it is just a video and not something you can simply walk around in VR.
My low key motivation was to make a clone/model like what Matterport and sell it to real estate companies. Major gap in my understanding - the cause of me to loose steam is - I was not sure how are they able to automate the step to generate clean mesh from bunch of photos from a camera. To me, this is the most labor intensive part. Later, I heard there are ML model that is able to do this very step, I have no idea on this tho.