Show HN: Camera Calibration Using VGGT

Continuing with my robot training collection pipeline, one of the challenges has been obtaining calibrated cameras from sparse multi-view inputs. I previously worked with Dust3r but found its accuracy insufficient for generating reliable camera parameters. Recently, Meta released VGGT—Visual Geometry Grounded Transformer—which improves both inference speed and accuracy. With assumptions such as non-metric scale and the use of pinhole cameras, I believe fully automatic camera calibration is now achievable.

Using @rerun, I established a baseline from the HoCAP dataset and conducted a qualitative comparison among the ground-truth calibrated cameras, Dust3r, and VGGT. The improvements are evident in both the camera parameters and the multi-view depth map/point cloud.

I will eventually add quantitative comparisons, such as Relative Rotation Accuracy (RRA) and Relative Translation Accuracy (RTA).

I’m one step closer to developing a pipeline that integrates two iPhones and a Quest 3. Camera calibration was a major hurdle that has now been conquered! I believe that the egocentric (first-person) perspective is critical for achieving dataset collecting at scale, while the exocentric (third-person) perspective will also be crucial for accuracy—especially when addressing the occlusions that can arise during fine-grained hand and object interactions.

you can find the integrated VGGT calibration code here - https://github.com/rerun-io/pi0-lerobot?tab=readme-ov-file#c...