Uses coordinate-based neural networks to model the scene volumetrically. However, in the case of this paper does not use an MLP to represent the scene. Instead, proposes to directly learn a voxel grid representation.
Yes, and then polygonal models (and other things) are built from those.
For anyone who wants a more technical dive into the photogrammetry pipeline, here's a video I made for a company called Mapware for NVIDIA GTC 21: https://youtu.be/ktDVWzshR4w?t=331
Some techniques for downsampling point clouds use voxelgrid representations but in general you're mapping pixel data from varied images to each other in space and producing points from that to try and capture surface geometry.
But it does? Agisoft will first estimate depth maps and then project them into a voxel volume for extracting the high-resolution mesh. Debug logging even lists the voxel grid dimensions.
regular photogrammetry usually means searching for common features in a bunch of photos.
if you find the same feature in 3 photos you can triangulate its location in 3d space.
the output of this process is a point cloud which you can then process into a triangle mesh.
(google structure from motion).
this OTH is differentiable voxel rendering. so basically optimizing the colors of a bunch of cubes to make it look the pictures.
using backpropagation just like you would do it for neural networks.