Hi all, I am Thomas and I develop volumetric video capture software for my startup ScannedReality. In volumetric videos, you can freely choose your viewpoint during playback. You could think of the videos as animated 3D models, but reconstructed from real video data instead of modeled by hand.
This makes them very interesting for Augmented and Virtual Reality, where they can be displayed in true scale in 3D as opposed to traditional videos that would play on a flat virtual screen.
Iād like to invite you to check out our browser-based demos at https://scanned-reality.com/demo where you can change the viewpoint using the mouse, respectively with your fingers on a touchscreen. If you are okay with installing an app, under the link above there is also an Android app and one for Meta Quest (2 or later) which both display the videos in AR.
We just released a beta version of our volumetric capture software which allows you to record your own volumetric videos and play them back on various platforms. If you happen to have compatible cameras (Azure Kinect as of now, we recommend four or more cameras; other camera types to follow) and a Nvidia graphics card (for CUDA), feel free to join the beta test here: https://scanned-reality.com/beta_join (Sorry for the sign-up requirement for this one; after joining, you can unsubscribe again anytime if you want to.)
The background:
I am passionate about Virtual Reality, but one of its current issues is that there is relatively little interesting content for it. My hope is that volumetric video will become a type of content that both draws people into VR and which will see increased demand as VR will grow. Think of virtual concerts that you can watch as if you were on the stage, virtual theater where you can be among the actors, and so on.
Brief technical overview:
We used 32 RealSense D415 depth cameras for reconstructing the demo videos on our website, arranged all around the subject. Each camera is connected to a Raspberry Pi 4 with an SSD, which basically turns it into a network camera with integrated storage. The Pis are all connected via a local network and time-synchronized using PTP. A "control center" PC controls the recording and later processes the recorded data.
Using data from all cameras, we currently first reconstruct an independent 3D mesh for each video frame. Then, we track the motion between frames and partition the video into a series of chunks. One frame of each chunk is chosen as a keyframe, while each remaining frame of the chunk is represented by a deformation of the keyframe to match that video frame. The geometry of a video is thus stored as a series of animated meshes. The texture is stored as an AV.1-encoded video. The final videos are saved in a custom file format and played back using our own video player since there is no standard format for volumetric videos yet.
Notice that, since they were created to advertise our studio-based recording service in which some touch-up is included, the current demo videos on our website also include slight manual touch-up (mostly just selective smoothing of the geometry).
Feel free to ask if you would like to know more about a specific topic.